andreaspa
Hot Shot
Hot Shot

Intel CPU bug - VMware fix on the way?

Jump to solution

I've read up on forums, mailing lists and on The Register that there seems to be a severe hardware bug with Intel CPUs:

'Kernel memory leaking' Intel processor design flaw forces Linux, Windows redesign • The Register

There are Linux patches in the works, and Microsoft will release patches during January's patch tuesday. Is ESXi vurnerable, and if so, when can we expect a patch for this? Since it's a critical issue, it will require lots of patching and planning - any heads up would be appreciated!

Tags (2)
76 Replies
larsupilami
Contributor
Contributor

I am also working on this issue and try to secure the vSphere installations.

During this process I noticed that it looks like not all CPUs get microcode updates through the vmware patches mentioned in: https://kb.vmware.com/s/article/52085

https://kb.vmware.com/s/article/52206

I applied these updates to old HPE servers (DL380p Gen 7 and Gen 😎 with X5650 or E5-2640 CPUs (launched in 2012 or even before). With the result that there was no Hypervisor-Assisted Guest Mitigation. I check this with the script provided by Wiliam Lam: https://www.virtuallyghetto.com/2018/01/verify-hypervisor-assisted-guest-mitigation-spectre-patches-...

and the Microsoft way with the SpeculationControl script inside the Guest OS (with vSphere Hardware Version >9 and after a cold boot of the system): Speculation Control Validation PowerShell Script

Of course I followed the described update procedure very close. And tried all tricks I could imagine or find on the web (reboot esx host, power off the esx host and boot it again, change Cluster EVC, remove the host from EVC cluster, and even reinstalling the host from scratch).

So it looks like the vmware patch is not updating the microcode on all Intel CPUS. This would also fit to the Intel statement that they will bring microcode updates for processors of the last 5 years within this week. But I could find a list of processors Intel already provided CPU microcode updates.

To double check the finding I installed the updates (with the exact same procedure) on another platform with Lenovo blades using E5-2660 v4. This systems also has no CPU microcode/Bios updates available so far like the HPE systems.

But here everything is fine. I got Hypervisor-Assisted Guest Mitigation enabled and verified by both scripts mentioned above just by installing the lastest vSphere updates as described in https://kb.vmware.com/s/article/52085

So it would be really interesting to know which CPUs models microcode is really updated by the vmware patch.

If there is something I am missing or if there is a trick I am missing I am glad to get this information.

0 Kudos
pbraren
Hot Shot
Hot Shot

VMware has also published a helpful overview article 52245, explaining, and linking to many other KB articles for various VMware products. Lasted updated on 1/12/2018:

VMware Response to Speculative Execution security issues, CVE-2017-5753, CVE-2017-5715, CVE-2017-575...

https://kb.vmware.com/s/article/52245

Excerpts:

Introduction

On January 3, 2018, it became public that CPU data cache timing can be abused by software to efficiently leak information out of mis-speculated CPU execution, leading to (at worst) arbitrary virtual memory read vulnerabilities across local security boundaries in various contexts. Three variants have been recently discovered by Google Project Zero and other security researchers; these can affect many modern processors, including certain processors by Intel, AMD and ARM:

  • Variant 1: bounds check bypass (CVE-2017-5753) – a.k.a. Spectre
  • Variant 2: branch target injection (CVE-2017-5715) – a.k.a. Spectre
  • Variant 3: rogue data cache load (CVE-2017-5754) – a.k.a. Meltdown

Operating systems (OS), virtual machines, virtual appliances, hypervisors, server firmware, and CPU microcode must all be patched or upgraded for effective mitigation of these known variants.

VMware hypervisors do not require the new speculative-execution control mechanism to achieve this class of mitigation and therefore these types of updates can be installed on any currently supported processor. No significant performance degradation is expected for VMware’s hypervisor-specific mitigations.

TinkerTry.com
JakubD
Enthusiast
Enthusiast

And now VMware has recalled the latest patches, because of issues with microcode upgrades:

https://kb.vmware.com/s/article/52345

0 Kudos
andreaspa
Hot Shot
Hot Shot

Read that as well, need to know a couple of things from VMware:

* What happens if we don't do the workaround? I haven't seen anything about what problems this can cause.

* Is another workaround to install microcode/BIOS from server provider, or are their updates faulty as well?

Can anyone from VMware please enlighten us?

0 Kudos
dalo
Hot Shot
Hot Shot

according to Intel the issues are "higher system reboots":

Intel Security Issue Update: Addressing Reboot Issues

0 Kudos
andreaspa
Hot Shot
Hot Shot

Ah, yes. That makes it so much clearer. Wonder what the "higher system reboots" acually means.

Will the host spontaneously reboot, or will it PSOD?

0 Kudos
larsupilami
Contributor
Contributor

The issue is not related to the hosts.

It is related to the guest OS. So you could see higher Guest OS reboots. What ever a reboot is.

In addition I wanted to share the following article:

The Curious Case of the Intel Microcode Part #2 - It Gets Better — Then Worse - vNinja.net

Good article and some details about the status of current CPU microcode updates. For all people with older CPUs:

No Sandy Bridge or older microcode updates so far. So be patient if you are still using such CPUs.

0 Kudos
Petter_Lindgren
Contributor
Contributor

We've already deployed the patches. Thankfully, our servers don't have the affected CPU's (Broadwell and Haswell).

What a huge mess!

0 Kudos
larsupilami
Contributor
Contributor

For those who installed the now removed vmware patches make sure to check

https://www.virtuallyghetto.com/2018/01/automating-intel-sighting-remediation-using-powercli-ssh-not...

A very nice and esay to handle way of disabling the faulty microcode update for your guest VMs.

Unfortunately had to use it for one cluster. But thanks to William Lam and his great script this was very easy to handle.

0 Kudos
nponeccop
Contributor
Contributor

Just to clarify, they removed all January patches for ESXi, not only the latest one with microcode. Fortunately I decided to wait and didn't install them.

As for "more frequent reboots" I updated my laptop BIOS and now Windows has "more frequent BSODs". But from what other people said here, I understand that for ESXi it means guest crashes, the host is not affected.

0 Kudos
andreaspa
Hot Shot
Hot Shot

The issue is not related to the hosts.

It is related to the guest OS. So you could see higher Guest OS reboots. What ever a reboot is.

I'd love to see some confirmation or information from VMware about this, would make the scope of mitigation much more clear, and how we should prioritize stuff in day to day operations.

This whole issue has been a mess, anyone who just sat still and didn't patch did the right choice so far..

VCPShane
Enthusiast
Enthusiast

'removed all January patches', I know they have stated this in relation to VMSA-2018-0004.

Is there an official statement for the other advisories VMSA-2018-0002?

0 Kudos
larsupilami
Contributor
Contributor

Out of https://kb.vmware.com/s/article/52345 that provides the information about the faulty update it says:

For ESXi hosts that have not yet applied one of the following patches ESXi650-201801402-BG, ESXi600-201801402-BG, or ESXi550-201801401-BG, VMware recommends not doing so at this time. It is recommended to apply the patches listed in VMSA-2018-0002instead.

0 Kudos
nponeccop
Contributor
Contributor

Yes, there are official statements in the same link: VMware Knowledge Base

For ESXi hosts that have not yet applied one of the following patches ESXi650-201801402-BG, ESXi600-201801402-BG, or ESXi550-201801401-BG, VMware recommends not doing so at this time. It is recommended to apply the patches listed in VMSA-2018-0002 instead.

For servers using unaffected processors which have applied either the VMSA-2018-0002 or ESXi patches ESXi650-201801402-BG, ESXi600-201801402-BG or, ESXi550-201801401-BG, no action is required.

0 Kudos
VCPShane
Enthusiast
Enthusiast

Thanks, I guess that means that CVE-2017-5753 is not addressed yet for ESXi 5.5\6\6.5 then.

0 Kudos
colinsmith
Contributor
Contributor

I really wonder why so many are so concerned if they think of the issue.

IF two threads hit one core in a hyperthreaded CPU at the same time

IF one is a hacked thread

IF no context switches happen

IF the hacked thread can dump the L1 cache data

IF that date contains useful data and is transfered

IF it's accessed and makes it through all other security measures and firewalls

IF it's decoded

IF it's useful

THEN it's a problem

Lot's of IF's and random happenings

So ESXi has can stop scheduling for hyperthreads

It's doesn't schedule for hyperthreading  (in the best/worst case situation that's maybe a 30% decrease in performance)

My guess is they'll code for selective hyperthreading, IF a VM is multicored then it'll let it hypertheard and let the OS deal, But not let cross VM hyperthreading unless Intel fixes the microcode.

0 Kudos
Dave_the_Wave
Hot Shot
Hot Shot

Older CPUs that Intel has no intention of addressing microcodes, do they take any performance hit on patched VIBs?

Can late builds of ESXi have its patches "disabled"?

I'd rather let backups protect everything, and leave ESXi running without bandaids.

0 Kudos