VMware Cloud Community
exxoid
Contributor
Contributor
Jump to solution

Want to change VMware Cluster EVC mode from Merom to Haswell, any issues with this?

Hello,

We have a 3 host VMware cluster (vCenter 6.0 and hosts running ESXi 6.0). The cluster has EVC enabled so we can migrate between the hosts and we use DRS/HA. The VMs running are mostly Windows 2008 R2 - Windows 2016 with a couple Linux machines. We run standard Microsoft services like AD, Exchange, SQL, etc.

The current EVC mode is Intel Merom Generation and this has not been changed since, but we have since gone through a few host refresh cycles.

If I click on the EVC it lets me increase it all the way up to Intel Haswell Generation EVC.

The 3 hosts are running:

  • Intel Xeon Gold 6134
  • Intel Xeon E5-2667 v4
  • Intel Xeon E5-2680 v3

We have never done this previously and we don't have a lab where we can test this so I am trying to get some reaffirmation on what will happen when i go ahead and do this.

I did quite a bit of research and my understand is that running VMs will not recognize the new feature bits until they are powered off and then powered back on again. Is this correct? Is there a way to check what CPU feature bits the OS sees before/after? Will a utility like CPU-Z show this?

Has anyone run into issues when they raise the EVC in terms of stability? Is there anything I should be aware of?

Lastly, if we enable EVC but don't shutdown and startup our VMs - will we still be able to migrate these VMs between hosts? I plan to raise the EVC but I won't be able to power cycle all the VMs right away - so I want to make sure I won't loose migration capabilities for those VMs?

Any big gains to be had going from Intel Merom to Intel Haswell - seems like quite a technology jump to go from one to the other...?

Just concerned with things I cannot test and verify in a lab - appreciate any feedback anyone can provide that has gone through this exercise.

Thanks

0 Kudos
1 Solution

Accepted Solutions
bluefirestorm
Champion
Champion
Jump to solution

Is there a way to check what CPU feature bits the OS sees before/after? Will a utility like CPU-Z show this?

CPU-Z can be quite limited in what it shows. In the vmware.log there is a section called Capability Found:

An example is like this

vmx| I125: Capability Found: cpuid.PCLMULQDQ = 0x1

That section is quite similar to the "flags" in cat /proc/cpuinfo output of a Linux machine.

There is also the entire dump of the EAX, EBX, ECX, EDX flag output from the CPUID instruction with different leaf inputs.

Any big gains to be had going from Intel Merom to Intel Haswell - seems like quite a technology jump to go from one to the other...?

Certainly there will be big gains; but quite doubtful if end users will really notice the difference.

From Westmere, AES instruction set , PCLMULQDQ instruction will make encryption/decryption faster. So the encrypted channels for AD authentication and Exchange messages might make the server CPUs more efficient and faster in such tasks. If there are is a practice of encrypting the email messages itself, the server CPUs will also be faster in handling it.

From Ivy Bridge, there should be virtual interrupt delivery in a VM so that the VM will not have do a VMexit (thus saving the expensive CPU cycles) when an interrupt occurs (such as from a virtual NIC).

From Haswell, AVX2, INVPCID. The INVPCID is important in the wake of the Meltdown patch. The INVPCID instruction is required for the PCID performance mitigation in the Get-SpeculationControlSettings Powershell to show up as TRUE for Windows OS versions that supports it. If I am not mistaken the Linux kernel patches for Meltdown also rely on the INVPCID instruction.

I think 1GB pages were also introduced with Haswell; so the VM OSs that can take advantage of that feature potentially can run more efficiently.

Just make sure that the hardware compatibility is also up to the level ESXi 6.0 can support (version 11). The lower hardware compatibility version can also mask certain CPU capabilities. And version 11 is also required for INVPCID to be exposed to the VMs.

View solution in original post

0 Kudos
6 Replies
a_p_
Leadership
Leadership
Jump to solution

You are correct, powered on VMs will continue to work with their currently presented Merom CPU features until they are power cycled.

To be honest, I actually never really checked which CPU features there are available in which CPU generation. Anyway, I also never experienced any issues after raising the EVC level. If you are interested what exactly has been changed, you may check whether Intel provides documents for this.

If you are planning to upgrade your host's firmware - due to Microcode Patches for Meltdown/Spectre - it might be a good idea to combine this with the EVC level change, because the VM's need to be power cycled after patching the Microcode anyway.


André

0 Kudos
exxoid
Contributor
Contributor
Jump to solution

Oh I didn't know that - so when we apply latest BIOS patches (Dell R7xx servers) to take care of Meltdown/Spectre issue I know that we need to restart the hypervisors to perform the BIOS update.

But in addition to that, we also need to power cycle the VMs running in the cluster? We couldn't just migrate them off the host being firmware patched and then migrate them back?

0 Kudos
bluefirestorm
Champion
Champion
Jump to solution

Is there a way to check what CPU feature bits the OS sees before/after? Will a utility like CPU-Z show this?

CPU-Z can be quite limited in what it shows. In the vmware.log there is a section called Capability Found:

An example is like this

vmx| I125: Capability Found: cpuid.PCLMULQDQ = 0x1

That section is quite similar to the "flags" in cat /proc/cpuinfo output of a Linux machine.

There is also the entire dump of the EAX, EBX, ECX, EDX flag output from the CPUID instruction with different leaf inputs.

Any big gains to be had going from Intel Merom to Intel Haswell - seems like quite a technology jump to go from one to the other...?

Certainly there will be big gains; but quite doubtful if end users will really notice the difference.

From Westmere, AES instruction set , PCLMULQDQ instruction will make encryption/decryption faster. So the encrypted channels for AD authentication and Exchange messages might make the server CPUs more efficient and faster in such tasks. If there are is a practice of encrypting the email messages itself, the server CPUs will also be faster in handling it.

From Ivy Bridge, there should be virtual interrupt delivery in a VM so that the VM will not have do a VMexit (thus saving the expensive CPU cycles) when an interrupt occurs (such as from a virtual NIC).

From Haswell, AVX2, INVPCID. The INVPCID is important in the wake of the Meltdown patch. The INVPCID instruction is required for the PCID performance mitigation in the Get-SpeculationControlSettings Powershell to show up as TRUE for Windows OS versions that supports it. If I am not mistaken the Linux kernel patches for Meltdown also rely on the INVPCID instruction.

I think 1GB pages were also introduced with Haswell; so the VM OSs that can take advantage of that feature potentially can run more efficiently.

Just make sure that the hardware compatibility is also up to the level ESXi 6.0 can support (version 11). The lower hardware compatibility version can also mask certain CPU capabilities. And version 11 is also required for INVPCID to be exposed to the VMs.

0 Kudos
exxoid
Contributor
Contributor
Jump to solution

Appreciate the detailed response, and noted to make sure VM hardware is version 11 - i suspect this goes without saying that VMware Tools should also be up to date? I'll give CPU-Z a shot; would be good to have a relatively easy method to check pre/post CPU features exposed to ensure VM by VM they are seeing latest changes.

0 Kudos
bluefirestorm
Champion
Champion
Jump to solution

I don't think VMware Tools have any direct relation to EVC levels but updating it should not hurt as there have been security fixes to the SVGA driver and fixes to the vmxnet3 driver made last year. You might want to go through the release notes of the VMware Tools that are newer than the version that you have so that you have clearer picture as to what fix(es) you are getting.

https://docs.vmware.com/en/VMware-Tools/10.2/rn/vmware-tools-1025-release-notes.html

It will be better to see the before and after EVC change by saving the vmware.log files of the VMs that you are interested in. For the Linux VMs, you could also use the /proc/cpuinfo as basis for comparison for the before and after EVC change. While CPU-Z utility is good at what it does and provide, once you see what is inside the vmware.log, you will see how little it presents in the "Instruction" section. If I am not mistaken, the "Instruction" section of CPU-Z is a small subset interpretation of CPUID leaf 1 and leaf 7 of Intel CPUs.

| vmx| I125: hostCPUID level 00000001, 0: 0x00040661 0x02100800 0x7ffafbbf 0xbfebfbff

| vmx| I125: hostCPUID level 00000007, 0: 0x00000000 0x000027ab 0x00000000 0x0c000000

For ESXi vmware.log there should also be guestCPUID section. For Fusion/Workstation vmware.log there is a guest vs. host CPUID (comparing the virtual CPU versus the host CPU) which is a line by line comparison.

This is an output of the Instruction of CPU-Z from a Windows 10 VM on a Crystal Well (Haswell) MacBook Pro laptop running on Fusion 8.5.10.

MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, AES, AVX, AVX2, FMA3

While this is the flags section of /proc/cpuinfo from a Linux VM on the same macOS Fusion host. The ones in bold blue are the same ones reported by CPU-Z. The EM64T just means it has support for 64-bit.

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb invpcid_single pln pts dtherm spec_ctrl stibp retpoline kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt

Almost all the flags in /proc/cpuinfo has a one-to-one equivalent in the Capability Found in the vmware.log. For the Spectre microcode, I think the spec_ctrl in the Linux VM cpuinfo flag is for ibpb and ibrs while stibp has its own flag.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

We couldn't just migrate them off the host being firmware patched and then migrate them back?

Unfortunately not. A VM power cycle is required so that the new CPU features are presented to the VM. It's basically the same as with raising the EVC level.

See e.g. Step 3 "For each virtual machine, enable Hypervisor-Assisted Guest mitigation via the following steps:" at https://kb.vmware.com/s/article/52085

André

0 Kudos