VMware Cloud Community
actyler1001
Enthusiast
Enthusiast

Poor CPU performance after ESXi 6.5 patch

Hi Everyone, I just had a strange experience with poor CPU performance and thought I would post on it.  I currently run two DL360e Gen8 servers in my home lab and upgraded from ESXi 6.5 build 6765664 to the latest build 8935087.  WOW, CPU performance went through the floor.  As soon as I "vMotioned" a couple of VMs to this patched host, the CPU spiked at 100% and stayed there.  Moved the VMs back to my unpatched host and they work great.  My guess is that this has something to do with Spectre and Meltdown mitigation, but I am not sure as of yet.  I ended up reverting my patched host back to build 6765664 and things are normal again.

CPU in host that gave me problems: Intel(R) Xeon(R) CPU E5-2450L 0 @ 1.80GHz

It would be a shame to never patch my hosts again though.  Has anyone else run into this?  Is there a way to patch your host, but disable the mitigation to continue to keep your CPU performance.?

0 Kudos
3 Replies
bluefirestorm
Champion
Champion

Short version answer:

My guess is that it is mitigation against Spectre variant 4 (SSB) that caused it. To turn off the SSBD is as good as not applying 6.5U2b (build 8935087) at all.

Slightly longer answer:

If you want to apply 6.5U2b (so that you get Spectre variants 1 and 2 patches and other ESXi specific fixes), you could to try to disable SSBD by masking bit 31 of EDX register cpuid leaf 7. Add the line

cpuid.7.edx = "0---:----:----:----:----:----:----:----"

to the /etc/vmware/config of the affected host

Longer version answer:

I am assuming the affected host also has the latest Intel microcode either from VMware or from HPE in the form of the BIOS/EFI update.

You could get on to a different patch level: 6.5U2 (8294253) or 6.5U1g (7967591). 6.5U1g has the ESXi component to patch Spectre variants 1 and 2. I haven't come across Spectre variant 1 and variant 2 and Meltdown patches causing CPU of guest VMs to spike to 100% and staying there.

In case patches against Spectre variant 1 and 2 also causes grief, you can disable the mitigation against both Spectre variant 1 and variant 2 in different ways.

At the ESXi host level, you could set bit 26 and 27 of EDX register of CPUID leaf 7 to zero.

cpuid.7.edx = "0---:00--:----:----:----:----:----:----"

Mitigation against Meltdown (aka Spectre variant 3) is purely a guest OS level patch (i.e. does not require an ESXi patch). The performance hit from Meltdown patch only hits certain workloads that are network or disk I/O intensive; but usually it is reported as higher CPU usage compared to pre-patch levels instead of being spiked to 100%. To mitigate against this performance hit the CPU needs to have the INVPCID instruction (Haswell CPUs and newer) and the OS needs to be capable of making use of INVPCID. So in your case with a E5-2450L is a Sandy Bridge CPU which would not have the INVPCID instruction.

You could disable Spectre variant 1 and variant 2 and Meltdown at the Windows OS level by setting registry values. You can look at this Microsoft article.

https://support.microsoft.com/en-us/help/4072698/windows-server-guidance-to-protect-against-the-spec...

I am not aware of methods to disable Spectre variant 1 and 2 and Meltdown at the Linux OS level.

0 Kudos
actyler1001
Enthusiast
Enthusiast

bluefirestorm, Thanks for your reply.  Yes, we both agree that it has got to be related to Spectre and Meltown patching.  Couple of follow up questions for you...

1. The ( cpuid.7.edx = "0---:----:----:----:----:----:----:----" ) fix that you mention..  You mention that I could "try" this if I wanted to continue to apply VMware bug fixes and patches in the future without the performance hit of the Spectre and Meltdown mitigation.  Why use the word, "try"?  Is this not officially supported by VMware or does it sometimes not work as expected depending on the hardware platform?

2. You said "I am assuming the affected host also has the latest Intel microcode either from VMware or from HPE in the form of the BIOS/EFI update."

I do regularly update firmware and BIOS on these Gen8 servers by using the Proliant Support Pack bootable ISO.  However they haven't released an updated BIOS build for quite a long time.  I do see that HPe made a microcode update available in "VIB" form maybe for installation directly on the ESXi OS.  I am hesitant to apply this now after my experience with performance.

I also assumed that the VMware patch would include all the necessary microcode updates and this wouldn't need to be applied?

3. You said "You could disable Spectre variant 1 and variant 2 and Meltdown at the Windows OS level by setting registry values"

I am wondering if this would have any affect?  The experience I just went through seemed to affect both Windows 7 (fully patched) and 2008 R2 (Partially patched) VMs.  These were just the two that I noticed were performing terribly after patching the VMware host.  Nothing in the VM changed, it was only the host that patched.  Are you saying that if Spectre and Meltdown mitigation was intentionally disabled in the guest VM, the host patches wouldn't have had the same performance impact?

I did run this tool <link below> on the Windows 7 VM while it was running on the patched host.  It only reported that Meltdown mitigation was in place and no amount of pressing the disable mitigation button seemed to help.  Powered vm on/off after changes too.

GRC | InSpectre 

0 Kudos
bluefirestorm
Champion
Champion

I am not an employee of VMware.

To be clear I suspect it is the Spectre variant 4 (Speculative Store Bypass) that has caused the slowdown.

https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/ADV180012

There are four major Spectre variants. Spectre variant 3, aka Meltdown or rogue data cache load, does not require an Intel CPU microcode update. There is also Spectre variant 3a.

Microcode updates usually come in the form of the a BIOS/EFI firmware update of the physical machine. VMware has provided microcode updates for certain CPUs (Sandy Bridge and newer) through ESXi VIBs. Microsoft also provided microcode updates for Haswell and newer CPUs. If I am not mistaken, the microcode update can be updated through Linux OS updates. The microcode delivery through Microsoft Windows and Linux OS updates allows machines that no longer have BIOS/EFI updates from the system vendor to be updated. Apple also provided microcode updates through macOS updates.

One way to check the Spectre microcodes updates is to look at the vmware.log of any VM the value of EDX register CPUID leaf 7. I don't have a machine that has the microcode update for Spectre variant 4 yet. This is from the vmware.log of a VM running Fusion 8.5.10 on a MacBook Pro.

vmx| I125: hostCPUID level 00000007, 0: 0x00000000 0x000027ab 0x00000000 0x0c000000

It has the microcode updates for Spectre variants 1 and 2 but it does not have the variant 4 microcode update that is why bit 31 is 0 for the EDX register while bit 26 and 27 are 1.

If you are not familiar with the VMware ESXi patches with regards to Spectre variants 1 and 2, I'd suggest you look at this KB.

https://kb.vmware.com/kb/52085

Sorry I am not clicking on that link you provided and I won't try those tools. Microsoft has provided similar tools in the form of Powershell scripts to check Spectre and Meltdown status. And they also provided information on how to disable them at the Windows OS level.

https://support.microsoft.com/en-us/help/4073119/protect-against-speculative-execution-side-channel-...

0 Kudos