I/O performance regression on NVMes under same bridge (dual port nvme)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
In Progress
|
Undecided
|
Massimiliano Pellizzer | ||
Oracular |
Won't Fix
|
Undecided
|
Unassigned | ||
Plucky |
Fix Released
|
Medium
|
Massimiliano Pellizzer | ||
Questing |
In Progress
|
Undecided
|
Massimiliano Pellizzer |
Bug Description
[ Impact ]
iommu/vt-d: Optimize iotlb_sync_map for non-caching/
The iotlb_sync_map iommu ops allows drivers to perform necessary cache
flushes when new mappings are established. For the Intel iommu driver,
this callback specifically serves two purposes:
- To flush caches when a second-stage page table is attached to a device
whose iommu is operating in caching mode (CAP_REG.CM==1).
- To explicitly flush internal write buffers to ensure updates to memory-
resident remapping structures are visible to hardware (CAP_REG.RWBF==1).
However, in scenarios where neither caching mode nor the RWBF flag is
active, the cache_tag_
iotlb_sync_map path, effectively becomes a no-op.
Despite being a no-op, cache_tag_
through all cache tags of the iommu's attached to the domain, protected
by a spinlock. This unnecessary execution path introduces overhead,
leading to a measurable I/O performance regression. On systems with NVMes
under the same bridge, performance was observed to drop from approximately
~6150 MiB/s down to ~4985 MiB/s.
Introduce a flag in the dmar_domain structure. This flag will only be set
when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
cache_tag_
set. This flag, once set, is immutable, given that there won't be mixed
configurations in real-world scenarios where some IOMMUs in a system
operate in caching mode while others do not. Theoretically, the
immutability of this flag does not impact functionality.
[ Fix ]
Backport the following commit:
- 12724ce3fe1a iommu/vt-d: Optimize iotlb_sync_map for non-caching/
- b9434ba97c44 iommu/vt-d: Split intel_iommu_
- b33125296b50 iommu/vt-d: Create unique domain ops for each stage
- 0fa6f0893466 iommu/vt-d: Split intel_iommu_
- 85cfaacc9937 iommu/vt-d: Split paging_
- cee686775f9c iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
to Plucky.
[ Test Plan ]
Run fio against two NVMEs under the same pci bridge (dual port NVMe):
$ sudo fio --readwrite=
verify that the speed reached with the two NVMEs under the same bridge is the same that would have been reached if the two NVMEs were not under the same bridge.
[ Regression Potential ]
This fix affects the Intel IOMMU (VT-d) driver.
An issue with this fix may introduce problems such as
incorrect omission of required IOTLB cache or write buffer flushes
when attaching devices to a domain.
This could result in memory remapping structures not being visible
to hardware in configurations that actually require synchronization.
As a consequence, devices performing DMA may exhibit data corruption,
access violations, or inconsistent behavior due to stale or incomplete
translations being used by the hardware.
---
[Description]
A performance regression has been reported when running fio against two NVMe devices under the same pci bridge (dual port NVMe).
The issue was initially reported for 6.11-hwe kernel for Noble.
The performance regression was introduced in the 6.10 upstream kernel and is still present in 6.16 (build at commit e540341508ce2f6
Bisection pointed to commit 129dab6e1286 ("iommu/vt-d: Use cache_tag_
In our tests we observe ~6150 MiB/s when the NVMe devices are on different bridges and ~4985 MiB/s when under the same brigde.
Before the offending commit we observe ~6150 MiB/s, regardless of NVMe device placement.
[Test Case]
We can reproduce the issue on gcp on Z3 metal instance type (z3-highmem-
You need to have 2 NVMe devices under the same bridge, e.g:
# nvme list -v
...
Device SN MN FR TxPort Address Slot Subsystem Namespaces
-------- -------
nvme0 nvme_card-pd nvme_card-pd (null) pcie 0000:05:00.1 nvme-subsys0 nvme0n1
nvme1 3DE4D285C21A7C001.0 nvme_card 00000000 pcie 0000:3d:00.0 nvme-subsys1 nvme1n1
nvme10 3DE4D285C21A7C001.1 nvme_card 00000000 pcie 0000:3d:00.1 nvme-subsys10 nvme10n1
nvme11 3DE4D285C2027C000.0 nvme_card 00000000 pcie 0000:3e:00.0 nvme-subsys11 nvme11n1
nvme12 3DE4D285C2027C000.1 nvme_card 00000000 pcie 0000:3e:00.1 nvme-subsys12 nvme12n1
nvme2 3DE4D285C2368C001.0 nvme_card 00000000 pcie 0000:b7:00.0 nvme-subsys2 nvme2n1
nvme3 3DE4D285C22A74001.0 nvme_card 00000000 pcie 0000:86:00.0 nvme-subsys3 nvme3n1
nvme4 3DE4D285C22A74001.1 nvme_card 00000000 pcie 0000:86:00.1 nvme-subsys4 nvme4n1
nvme5 3DE4D285C2368C001.1 nvme_card 00000000 pcie 0000:b7:00.1 nvme-subsys5 nvme5n1
nvme6 3DE4D285C21274000.0 nvme_card 00000000 pcie 0000:87:00.0 nvme-subsys6 nvme6n1
nvme7 3DE4D285C21094000.0 nvme_card 00000000 pcie 0000:b8:00.0 nvme-subsys7 nvme7n1
nvme8 3DE4D285C21274000.1 nvme_card 00000000 pcie 0000:87:00.1 nvme-subsys8 nvme8n1
nvme9 3DE4D285C21094000.1 nvme_card 00000000 pcie 0000:b8:00.1 nvme-subsys9 nvme9n1
...
For the output above, drives nvme1n1 and nvme10n1 are under the same bridge, and looking the SN it seems it is a dual port NVMe.
- Under the same bridge
Run fio against nvme1n1 and nvme10n1, observe 4897MiB/s after a short spike in the beginning at ~6150MiB/s.
# sudo fio --readwrite=
...
Jobs: 16 (f=16): [r(16)]
...
- Under different bridge
Run fio against nvme1n1 and nvme11n1, observe
# sudo fio --readwrite=
...
Jobs: 16 (f=16): [r(16)]
...
** So far, we haven't been able to reproduce it on another machine, but we suspect will be reproducible with any machine with a dual port NVMe.
[Other]
In spreadsheet [2], the are some profiling data for different kernel versions, showing consistent performance difference between kernel versions.
Offending commit : https:/
Report issue upstream [3].
[1] https:/
[2] https:/
[3] https:/
CVE References
- 2025-38003
- 2025-38004
- 2025-38029
- 2025-38031
- 2025-38032
- 2025-38033
- 2025-38034
- 2025-38035
- 2025-38036
- 2025-38037
- 2025-38038
- 2025-38039
- 2025-38040
- 2025-38041
- 2025-38042
- 2025-38043
- 2025-38044
- 2025-38045
- 2025-38047
- 2025-38048
- 2025-38050
- 2025-38051
- 2025-38052
- 2025-38053
- 2025-38054
- 2025-38055
- 2025-38057
- 2025-38058
- 2025-38059
- 2025-38060
- 2025-38061
- 2025-38062
- 2025-38063
- 2025-38064
- 2025-38065
- 2025-38066
- 2025-38067
- 2025-38068
- 2025-38069
- 2025-38070
- 2025-38071
- 2025-38072
- 2025-38073
- 2025-38074
- 2025-38075
- 2025-38076
- 2025-38077
- 2025-38078
- 2025-38079
- 2025-38080
- 2025-38081
- 2025-38082
- 2025-38088
- 2025-38091
- 2025-38092
- 2025-38100
- 2025-38101
- 2025-38102
- 2025-38103
- 2025-38105
- 2025-38106
- 2025-38107
- 2025-38108
- 2025-38109
- 2025-38110
- 2025-38111
- 2025-38112
- 2025-38113
- 2025-38114
- 2025-38115
- 2025-38116
- 2025-38117
- 2025-38118
- 2025-38119
- 2025-38120
- 2025-38122
- 2025-38123
- 2025-38124
- 2025-38125
- 2025-38126
- 2025-38127
- 2025-38128
- 2025-38129
- 2025-38130
- 2025-38131
- 2025-38132
- 2025-38134
- 2025-38135
- 2025-38136
- 2025-38137
- 2025-38138
- 2025-38139
- 2025-38140
- 2025-38141
- 2025-38142
- 2025-38143
- 2025-38145
- 2025-38146
- 2025-38147
- 2025-38148
- 2025-38149
- 2025-38151
- 2025-38153
- 2025-38154
- 2025-38155
- 2025-38156
- 2025-38157
- 2025-38158
- 2025-38159
- 2025-38160
- 2025-38161
- 2025-38162
- 2025-38163
- 2025-38164
- 2025-38165
- 2025-38166
- 2025-38167
- 2025-38168
- 2025-38169
- 2025-38170
- 2025-38172
- 2025-38173
- 2025-38174
- 2025-38175
- 2025-38176
- 2025-38265
- 2025-38267
- 2025-38268
- 2025-38269
- 2025-38270
- 2025-38272
- 2025-38274
- 2025-38275
- 2025-38277
- 2025-38278
- 2025-38279
- 2025-38280
- 2025-38281
- 2025-38282
- 2025-38283
- 2025-38284
- 2025-38285
- 2025-38286
- 2025-38287
- 2025-38288
- 2025-38289
- 2025-38290
- 2025-38291
- 2025-38292
- 2025-38293
- 2025-38294
- 2025-38295
- 2025-38296
- 2025-38297
- 2025-38298
- 2025-38299
- 2025-38300
- 2025-38301
- 2025-38302
- 2025-38303
- 2025-38304
- 2025-38305
- 2025-38306
- 2025-38307
- 2025-38310
- 2025-38311
- 2025-38312
- 2025-38313
- 2025-38314
- 2025-38315
- 2025-38316
- 2025-38317
- 2025-38318
- 2025-38319
- 2025-38350
- 2025-38352
- 2025-38414
- 2025-38415
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: kernel-daily-bug |
description: | updated |
Changed in linux (Ubuntu Questing): | |
assignee: | nobody → Massimiliano Pellizzer (mpellizzer) |
Changed in linux (Ubuntu Plucky): | |
assignee: | nobody → Massimiliano Pellizzer (mpellizzer) |
status: | New → Confirmed |
Changed in linux (Ubuntu Questing): | |
status: | New → Confirmed |
description: | updated |
Changed in linux (Ubuntu Plucky): | |
status: | Confirmed → In Progress |
Changed in linux (Ubuntu Questing): | |
status: | Confirmed → In Progress |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Plucky): | |
importance: | Undecided → Medium |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-noble-linux-nvidia-6.14 removed: verification-needed-noble-linux-nvidia-6.14 |
tags: |
added: verification-done-noble-linux-azure-nvidia-6.14 removed: verification-needed-noble-linux-azure-nvidia-6.14 |
Ubuntu 24.10 (Oracular Oriole) has reached end of life, so this bug will not be fixed for that specific release.