[UBUNTU 22.04] PCHID per Port Toleration - IBM z17 Enablement

Bug #2119650 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Committed
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Committed
Medium
Unassigned
Jammy
Won't Fix
Undecided
Massimiliano Pellizzer
Noble
Fix Released
Undecided
Unassigned
Plucky
Fix Released
Undecided
Unassigned
Questing
Fix Committed
Medium
Unassigned

Bug Description

[ Impact ]

Improve how s390 kernel discovers and organizes PCI devices, making it more robust, predictable, and compatible with modern hardware and virtualization scenarios.

Previously, PCI functions were grouped based on firmware-provided order and physical channel IDs (PCHIDs), leading to unstable device numbering and incorrect grouping in complex setups.

The new implementation explicitly sorts PCI functions by Requester ID (RID) and uses Topology IDs (TIDs) to group multi-function devices, ensuring deterministic and future-proof bus/domain creation. It fixes SR-IOV behavior by properly grouping Physical Functions (PFs) and Virtual Functions (VFs), allowing PFs initially in standby to form shared domains and treating isolated VFs without a parent PF as standalone devices.

Parent PF detection has been refactored for consistency, and error handling has been improved to prevent leaks and crashes when adding devices dynamically.

[ Fix ]

Backport to Jammy the following commits:
52c79e636a58da s390/pci: make better use of zpci_dbg() levels
0467cdde8c4320 s390/pci: Sort PCI functions prior to creating virtual busses
126034faaac5f3 s390/pci: Use topology ID for multi-function devices
fd1ae23b495b3a PCI: Prefer 'unsigned int' over bare 'unsigned'
c3df83e01a96ca PCI: Clean up pci_scan_slot()
fbed59ed8781d7 PCI: Split out next_ari_fn() from next_fn()
db360b1ea7faef PCI: Move jailhouse's isolated function handling to pci_scan_slot()
189c6c33ff421d PCI: Extend isolated function probing to s390
960ac362648780 s390/pci: allow zPCI zbus without a function zero
45e5f0c017e0d0 s390/pci: clean up left over special treatment for function zero
25f39d3dcb48bb s390/pci: Ignore RID for isolated VFs
48796104c864cf s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails
dc287e4c9149ab s390/pci: Fix SR-IOV for PFs initially in standby
05793884a1f305 s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()
2844ddbd540fc8 s390/pci: Fix handling of isolated VFs
8691abd3afaadd s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
42420c50c68f3e s390/pci: Fix missing check for zpci_create_device() error return

[ Test Case ]

Testing must be performed on an IBM z17 with Network Express adapters in direct mode.

Begin by attaching at least one adapter with two PFs, each mapped to a separate port, and confirm that the kernel groups them correctly into distinct domains without relying on PCHID ordering.

Next, enable SR-IOV on each PF and verify that child VFs are discovered, grouped with the correct PFs, and functional. Test with PFs initially in standby to ensure that shared domains are created dynamically. Introduce isolated VFs without a visible PF and confirm they enumerate as standalone devices.

Finally, perform hotplug and removal of both PFs and VFs, checking that zpci_dev structures are cleaned up correctly without leaks or crashes.

[ Regression Potential ]

The patchset affects s390 PCI subsystem, in particular:
- device enumeration
- function grouping
- SR-IOV handling
An issue in this code may introduce problems such as incorrect grouping of PFs and VFs, unstable bus numbering, or failure to associate VFs with their parent PFs.

[ Other Info ]

The patchset has already been tested by IBM using the following PPA:
- https://launchpad.net/~mpellizzer/+archive/ubuntu/ibm-z

---

Description:

PCHID per port toleration is required by the new IBM z17 machine.
This has already been included in Ubuntu 25.04
(see LP#2095480 : PCHID per Port Toleration).

This toleration item is also needed in Noble and Jammy in order to support new hardware in both LTS releases.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-214652 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2025-08-06 10:01 EDT-------
This item consists of two parts:

Part 1 (commits in order of application):

0467cdde8c4320bbfdb31a8cff1277b202f677fc s390/pci: Sort PCI functions prior to creating virtual busses
126034faaac5f356822c4a9bebfa75664da11056 s390/pci: Use topology ID for multi-function devices
25f39d3dcb48bbc824a77d16b3d977f0f3713cfe s390/pci: Ignore RID for isolated VFs
48796104c864cf4dafa80bd8c2ce88f9c92a65ea s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails

Part 2 (fixing two issues with the PCHID per port multi-function detection):

dc287e4c9149ab54a5003b4d4da007818b5fda3d s390/pci: Fix SR-IOV for PFs initially in standby
05793884a1f30509e477de9da233ab73584b1c8c s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()
2844ddbd540fc84d7571cca65d6c43088e4d6952 s390/pci: Fix handling of isolated VFs

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
importance: Undecided → Medium
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Massimiliano Pellizzer (mpellizzer) wrote :
Changed in linux (Ubuntu Questing):
status: New → Fix Committed
Changed in linux (Ubuntu Plucky):
status: New → Fix Released
Changed in linux (Ubuntu Noble):
status: New → Fix Released
Changed in linux (Ubuntu Jammy):
status: New → Triaged
assignee: nobody → Massimiliano Pellizzer (mpellizzer)
tags: added: kernel-daily-bug
Revision history for this message
Massimiliano Pellizzer (mpellizzer) wrote :

Hello Boris o/

In order to backport the requested commits to Jammy I had to backport the following commits too:
- 52c79e636a58 (s390/pci: make better use of zpci_dbg() levels), as a dependency of 0467cdde8c43
- fd1ae23b495b (PCI: Prefer 'unsigned int' over bare 'unsigned'), as a dependency of c3df83e01a96c
- c3df83e01a96c (PCI: Clean up pci_scan_slot()), as a dependency of 25f39d3dcb48
- fbed59ed8781d (PCI: Split out next_ari_fn() from next_fn()), as a dependency of 25f39d3dcb48
- db360b1ea7fae (PCI: Move jailhouse's isolated function handling to pci_scan_slot()), as a dependency of 25f39d3dcb48
- 189c6c33ff421 (PCI: Extend isolated function probing to s390), as a dependency of 25f39d3dcb48
- 960ac36264878 (s390/pci: allow zPCI zbus without a function zero), as a dependency of 25f39d3dcb48
- 45e5f0c017e0d (s390/pci: clean up left over special treatment for function zero), as a followup of 960ac36264878
- 8691abd3afaa (s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs), as a followup of 2844ddbd540f
- 42420c50c68f (s390/pci: Fix missing check for zpci_create_device() error return), as a followup of 2844ddbd540f

I will attach the set of patches in order for you to review them, before sending them to our mailing list.

Moreover, I am building a Jammy kernel with all the patches required on top in the following PPA:
- https://launchpad.net/~mpellizzer/+archive/ubuntu/ibm-z
Can you please test it? Can you also provide us a detailed test for the patchset so that we can mention it in the SRU cover letter?
Thanks

Revision history for this message
Massimiliano Pellizzer (mpellizzer) wrote :
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2025-08-12 11:12 EDT-------
Hi Massimiliano, thanks for your work. I have asked our PCI specialist Niklas to have a look at the backport.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → In Progress
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2025-08-13 06:25 EDT-------
I think we basically need the backports from thie IO2201 feature and its fixes. That should be the following master BZ and feature template. This was originally a feature but with z17 it has become required for tolerating the change to PCHIDs
per port for distributions that already had the SR-IOV support for the Cloud Network Adapter.

Summary: PCI: Extend isolated function probing to s390
Description: This feature adds the ability for a Linux running in LPAR to use
the Physical Function (PF) associated with the second port of
a ConnectX-5/6 card independent of that of the first port. Prior
to this feature, if the secondary PF was attached to an LPAR
without also attaching the primary PF, no PCI function would be
visible in Linux. The function would instead remain in a hidden
and disabled state. Conversely it has always been possible to use
the primary PF without attaching the secondary PF. This existing
behavior is confusing and needlessly restricts flexible usage of
these powerful network cards. The new behavior of allowing
independent usage is also in accordance with the PCI SR-IOV
specification which explicitly defines the dependencies between
PCI functions in the Dependency Link field which is still honored
and for ConnectX-5/6 cards allows this independent usage.
Upstream-ID: c3df83e01a96ca569d261bcdffa2fb858b1012fa
fbed59ed8781d7eecd7f45cde0188cf24eeb5c38
db360b1ea7faef290471bc1b2a7463b96fd20a07
189c6c33ff421def040b904fb14ef76c5bf5af4c
960ac362648780469b2f5584bb8cff540444d119
Problem-ID: IO2201
Date: 2022-12-09
Author: Niklas Schnelle <email address hidden>
Component: kernel

That seems to match this part of the list:
- c3df83e01a96c (PCI: Clean up pci_scan_slot()), as a dependency of 25f39d3dcb48
- fbed59ed8781d (PCI: Split out next_ari_fn() from next_fn()), as a dependency of 25f39d3dcb48
- db360b1ea7fae (PCI: Move jailhouse's isolated function handling to pci_scan_slot()), as a dependency of 25f39d3dcb48
- 189c6c33ff421 (PCI: Extend isolated function probing to s390), as a dependency of 25f39d3dcb48
- 960ac36264878 (s390/pci: allow zPCI zbus without a function zero), as a dependency of 25f39d3dcb48

The rest look like common code dependencies and fixes on top. So yes I think that list makes sense. With that I'll go and test the PPA and report back.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2025-08-14 05:14 EDT-------
I gave the Linux kernel in the PPA a try and it works as designed.
As for test procedure this being a z17 toleration patch it really requites
a z17 and also Network Express cards in direct mode. I tested with
two PFs associated with the two ports of one card which with z17
is now 2 seperate PCHIDs while RoCE on older machines had one PCHID
for the two ports. I also tested with enabling SR-IOV with a few child
VFs.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2025-08-18 06:53 EDT-------
Thanks Niklas, for your successful verification on Jammy.

tags: added: targetmilestone-inin2204 verification-done verification-done-jammy
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2025-08-26 11:28 EDT-------
*** Bug 214718 has been marked as a duplicate of this bug. ***

description: updated
Revision history for this message
Massimiliano Pellizzer (mpellizzer) wrote :

Thanks for the verification.
Patchset sent to KTML:
https://lists.ubuntu.com/archives/kernel-team/2025-August/162561.html

Changed in linux (Ubuntu Jammy):
status: Triaged → In Progress
Revision history for this message
Massimiliano Pellizzer (mpellizzer) wrote :

Hello Boris and Niklas
The patchset I sent to KTML has been rejected during the SRU review since it modifies code outside the s390x subsystem and that's too risky considering the patchset goal is hardware enablement for a new mainframe.

If you still want to have this feature on Jammy 5.15 kernel we can try two things:
1. Upstream the patchset to linux-5.15.y stable tree (I can try to forward it)
2. Reduce the complexity of the patchset and try a new submission on our KTML

Notice also that jammy-hwe kernel (6.8) already includes all the requested commits

Changed in linux (Ubuntu Jammy):
status: In Progress → Won't Fix
status: Won't Fix → Deferred
Changed in linux (Ubuntu Jammy):
status: Deferred → Won't Fix
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.