Hello, we had an issue with a guest CPU ID changing on a guest which I have listed below.
My questions are:
How are CPUID's generated for the guest machines? I have read a post from 2014 that said the CPUID was in the .vmx file. I had a look but it was not there so I assume that has changed over the years.
I checked other 2016 servers running on the same host and found each VM has a different CPUID, are unique values assigned to each guest during provisioning? Shouldn't the CPUID's be the same for machines running on the same host?
And my most important question: How can the CPUID change and stay changed after the server restarts on another host?
Thanks for the help.
Hi MC1024,
The CPUID masks the CPU features that are visible to the VM's OS. This could be dynamically changed if you change the EVC mode and power on VMs. Also changes in HW version will release new CPU features to a guest OS (up to the EVC limit)
Now rather than changing the CPUID which will require you setting cpuid.1.edx and cpuid.1.eax values, I would doubly confirm that the licensing is tied to the features the CPU supports. It would be normal to have licensing tied to the UUID of a machine but to a specific set of instructions seems a little brutal.
If you need to proceed then take a look at this post: Manipulating Guest CPUID
Kind regards.
Hello @ThompsG
Thanks for the reply.
So if EVC mode hasn't been changed on the cluster, vmotion has been performed successfully in the past month, and the 3 hosts have the same CPU and models (BL460c G9) what is the trigger? a powered on vmotion in isolation isn't enough to cause a change in CPUID as I've observed this happening without issue, but perhaps a cold vmotion is enough to trigger it? I will have to test in the lab.
We are working with the manufacturer to get a long term fix to this, but in short it collects multiple parameters such as CPUID, BIOS serial, Physical RAM and MAC address to generate a machine key that is provided to the manufacturer for activation. A very brutal process that doesn't work well at all in a virtual environment. The manufacturer says in their spec sheet that they support VMWare so this looks like a bug but I'd like to avoid triggering a license invalidation whilst the bug is resolved.
Agree - all things being similar, it should not have changed. I'm also assuming that there hasn't been an upgrade to VM HW either?
Look forward to your testing in the LAB.
Just to post an update. I did some lab testing of the following scenarios:
Windows 7 and Server 2016 standard:
In all scenarios the CPUID did not change.
in the release notes for U1 there is mention of EVC but I'm not sure that bug applies to our situation.
Hi,
I think the CPUID will not change unless you power cycle the VM.
You can vmotion a VM to another CPU type and the guest still thinks it is on the old CPU.
Only by rebooting the guest it recognizes that the CPU has changed.
--
Wil
Yes - you are right. Thinking about it, it’s only a full power cycle that can/could change the CPUID on a VM.
Hi All
As I said in my lab test post, I have rebooted machines, moved machines, re registered machines and yet the CPUID hasn't changed.The only thing I think that might have done it is the U1 patch.
Hi MC1024,
Agreed that might have been a possiblity. Have you tried modifying the edx and eax values? I know this is not perfect and pretty scary to be honest so perhaps try on a clone or snapshot?
Kind regards.
Hi ThompsG
This is the thread from 2014 I found referencing those values, but they don't exist in the .vmx of my machine. Do they exist somewhere else? How do I modify the values?
Found this interesting comment in the 6.7U1 release notes (VMware vCenter Server 6.7 Update 1 Release Notes 😞
Maybe this was set then unset with this change?
The thing is we've not reverted a host to an earlier version of ESXi. I did a search on what IBRS, STIBP and IBPB mean and it's all related to the Spectre/Meltdown stuff. I know the VMWare servers had mitigation applied when it first hit but I am now wondering if a BIOS or firmware update was done in the last few weeks. I'll talk to the team responsible and see if a firmware update was applied. Then it could be something as simple as a Spectre/Meltdown mitigation.
Hi,
Just running the Spectre/Meltdown update will expose those CPUID values to the guest OS.
An actual firmware update is not required to see those changes. Just having hosts at different patch levels could trigger "CPUID changed" warnings.
--
Wil
Hi MC1024,
Okay looking a little deeper and that was a curve ball we didn't need
Firstly before preceding, let me say all the standard disclaimers apply from this part on. Please do thorough testing and double check the work. Once happy read on.
Looking at your CPUID numbers we had 0FABFBFF000306F0 (before HA event) and 0F8BFBFF000306F0 (after). Between the two we can see the first 8 characters (edx) have changed or rather the 3rd one has changed from a "A" to a "8".
We convert the first 8 hexadecimal characters (0FABFBFF) to binary and we get the following: 0000 1111 1010 1011 1111 1011 1111 1111
Then we take the after CPUID and do the same (0F8BFBFF😞 0000 1111 1000 1011 1111 1011 1111 1111
Counting from the right we see the 21st bit has changed from a "1" to a "0" - this is the bit that sets the feature flag for Debug Store (CPUID - Wikipedia). While I haven't done a lot of research on this, I'm unsure what setting enables or disables this CPU feature.
To mask this feature as being enabled we can do the following (please make sure you have backups or test against another VM):
At this point if you click in the blank area beside "edx" you will get what is currently set on the VM in the section below the masking:
The Legend button tells you what all these mean but as I have Debug Store feature you can see the 21st bit is set to "H" which means the guest can see this feature and for a successful vMotion it must match. Yours settings might be different as it appears at least one of your hosts is not presenting this feature.
Copy the "Final Mask" and then modify to look like this - could have also use "H" which might be a safer option:
On my VM, I disabled and then did a couple of vMotion's which seemed to work without issues however your mileage may vary so test, test and test.
Kind regards.