Emerald Rapids cannot be used as Sapphire Rapids on Ubuntu due to TSX features

Bug #2106791 reported by DUFOUR Olivier
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Confirmed
Undecided
Hector CAO
Noble
Confirmed
Undecided
Hector CAO
Oracular
Won't Fix
Undecided
Hector CAO
Plucky
Confirmed
Undecided
Hector CAO
Questing
Confirmed
Undecided
Hector CAO

Bug Description

[ Impact ]

* libvirt cannot detect all features (e.g. vmx-* features) on recent Intel CPUs

* The reason is that it tries to read msr content which without the msr module
  (or builtin) being loaded will not deliver the right results, leading
  to mis-detect and mis to use features a CPU would have

* Load the module at boot and install to overcome that

[ Test Plan ]

On recent Intel CPU (Granite Rapids, Sierra Forest)

$ apt install --yes qemu-system-x86 libvirt-daemon-system libvirt-clients
# For the module, should be loaded
$ lsmod | grep msr
For the actual now better output detecting CPU features in libvirt
$ virsh capabilities

Some of the vmx-* features are missing in the features list and libvirt cannot output the right CPU model name. Be aware that capabilities can be cached,
so if you want to also compare with/without loaded module you need to clean the cache.

[ Where problems could occur ]

The fix only loads the msr module at boot and on install of libvirt.
Doing so is a no-op other than the bit of memory usage.
No issue is expected to happen elsewhere, if anything then something else
that also behaves differently if that is loaded or not, but we'd not be aware of such a program - and if they insist on not having it loaded it can be blocked in module configuration.

[ Other Info ]

Recently, some x86 features (vmx-*) are listed as part of the MSR registers instead of the traditional CPUID instruction.
The MSR is exposed to userspace via device /dev/cpu/*/msr managed by the MSR kernel module
As of now, this kernel module is not loaded by default on Ubuntu, as a consequence, libvirt cannot access the MSR registers and is unable to detect some of the CPU features.

Related branches

description: updated
summary: - Support for Emerald Rapids is missing, related to TSX
+ Emerald Rapids cannot be recognised as Sapphire Rapids due to TSX
+ features
summary: - Emerald Rapids cannot be recognised as Sapphire Rapids due to TSX
+ Emerald Rapids cannot be used as Sapphire Rapids on Ubuntu due to TSX
features
description: updated
Renan Rodrigo (rr)
tags: added: server-triage-discuss
Revision history for this message
Christian Ehrhardt (paelzer) wrote :

Hey hector, I subscribed you as this is all related to the topic you already work on.

John Chittum (jchittum)
tags: removed: server-triage-discuss
Revision history for this message
Hector CAO (hectorcao) wrote (last edit ):

The reason virsh detects Broadwell-noTSX-IBRS model in Jammy instead of SapphireRapids-noTSX is due to the fact that libvirt cannot detect taa-no feature because the msr kernel module is not loaded and as a consequence the files /dev/cpu/*/msr are not populated.

taa-no feature is specified as the 8th bit of the 0x10A MSR register
and libvirt needs the msr kernel module to detect the feature.

Reproduce instructions:
---

$ Boot the machine (make sure to not have tsx=on in the kernel commandline, by default tsx will be disabled)
$ virsh capabilities
  --> Detected model : Broadwell-noTSX-IBRS
$ sudo modprobe msr
$ sudo systemctl restart libvirtd
$ virsh capabilities
  --> Detected model : SapphireRapids-noTSX

description: updated
Hector CAO (hectorcao)
description: updated
Hector CAO (hectorcao)
Changed in libvirt (Ubuntu):
assignee: nobody → Hector CAO (hectorcao)
Revision history for this message
Christian Ehrhardt (paelzer) wrote :

This is a great analysis and insight,
which in turn means we probably should configure the system to load it.

As you work on a whole set of fixes around the same topic this probably needs to go together with the right types added and so on - you know it better already :-)

But for the module we might want to use:
- https://www.freedesktop.org/software/systemd/man/latest/modules-load.d.html
- only on the two x86 arches as that is where msr applies
- something like https://sources.debian.org/src/cups-filters/1.28.17-6/debian/rules/?hl=64#L64
- but today better into the non-user dir of /usr/lib/modules-load.d/
- and for the before-boot maybe add a modprobe to postinst

This should affect Debian as well (can't check their kernel config atm if it is a module) and fixing it with them has the benefit of not conflicting and making the merge more complex later, so a PR should probably best land in https://salsa.debian.org/libvirt-team/libvirt first, which would be picked up by questing and then SRUed once the other "detect my type" cases are ready as well.

Until then anyone affected has a workaround with the above already.

Revision history for this message
Hector CAO (hectorcao) wrote :

In Debian Sid:

debian@localhost:~$ uname -r
6.12.29-amd64

debian@localhost:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
NAME="Debian GNU/Linux"
VERSION_ID="13"
VERSION="13 (trixie)"
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.0
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

The module MSR is not built-in:

CONFIG_X86_MSR=m

Changed in libvirt (Ubuntu Plucky):
assignee: nobody → Hector CAO (hectorcao)
Changed in libvirt (Ubuntu Noble):
assignee: nobody → Hector CAO (hectorcao)
Changed in libvirt (Ubuntu Oracular):
assignee: nobody → Hector CAO (hectorcao)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu Noble):
status: New → Confirmed
Changed in libvirt (Ubuntu Oracular):
status: New → Confirmed
Changed in libvirt (Ubuntu Plucky):
status: New → Confirmed
Changed in libvirt (Ubuntu):
status: New → Confirmed
Hector CAO (hectorcao)
description: updated
Revision history for this message
Hector CAO (hectorcao) wrote :

@Christian : all the MPs are updated to enable the MSR module load, please take a look.

A bit of context for others : most of the time, libvirt cannot output the right host CPU model name, this is due to various reasons. One of them is some features are disabled by the kernel for security reasons, the other reason is libvirt being unable to detect some features via MSRs registers.

This fix only attempts to fix the second part (MSR read).

Revision history for this message
Christian Ehrhardt (paelzer) wrote :

Oracular will be EOL before this can land, for the rest this LGTM
I've updated SRU template to be more acceptable, hope it is good now.
In -devel I'd like to have the Debian submission done and referenced, then we can go on.

description: updated
Changed in libvirt (Ubuntu Oracular):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.