VMware Cloud Community
MattPietrek
Contributor
Contributor
Jump to solution

VM suspended on Haswell, can't resume on Sandybridge, despite Westmere-level CPUID masking in place

Hey jmattson (or similar brilliant VMware guru):

We have a VM that's failing to resume, despite my current understanding that it should. ESXi build is 2068190 on both hosts.

VM was was started on a Haswell:

2015-03-26T07:02:43.172Z| vmx| I120: FeatureCompat: No EVC masks.

2015-03-26T07:02:43.181Z| vmx| I120: hostCPUID vendor: GenuineIntel

2015-03-26T07:02:43.181Z| vmx| I120: hostCPUID family: 0x6 model: 0x3f stepping: 0x2

2015-03-26T07:02:43.181Z| vmx| I120: hostCPUID codename: Haswell EP/EN/EX

2015-03-26T07:02:43.181Z| vmx| I120: hostCPUID name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz

With these CPUID mask in the VMX (should match exactly what the UI reports as the "Westmere" mask):

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.1.eax = 00000000000000100000011001010001

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.1.ecx = 00000010100110001110001000111111

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.1.edx = 10001111111010111111101111111111

2015-03-26T07:02:43.363Z| vmx| I120: DICT        cpuid.80000001.ecx = 00000000000000000000000000000001

2015-03-26T07:02:43.363Z| vmx| I120: DICT        cpuid.80000001.edx = 00101000000100000000100000000000

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.d.eax = 00000000000000000000000000000000

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.d.ecx = 00000000000000000000000000000000

2015-03-26T07:02:43.363Z| vmx| I120: DICT               cpuid.d.edx = 00000000000000000000000000000000

2015-03-26T07:02:43.363Z| vmx| I120: DICT checkpoint.disableCpuCheck = true

Now.... We then try to resume the VM on a Sandybridge (same CPUID masking in effect):

2015-03-26T08:14:47.044Z| vmx| I120: FeatureCompat: No EVC masks.

2015-03-26T08:14:47.044Z| vmx| I120: hostCPUID vendor: GenuineIntel

2015-03-26T08:14:47.044Z| vmx| I120: hostCPUID family: 0x6 model: 0x2c stepping: 0x2

2015-03-26T08:14:47.044Z| vmx| I120: hostCPUID codename: Westmere EP

2015-03-26T08:14:47.044Z| vmx| I120: hostCPUID name: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz

And if fails to run:

2015-03-26T08:14:47.619Z| vmx| I120: FeatureCompat: No VM masks.

2015-03-26T08:14:47.619Z| vmx| I120: MonPmc: ctrBase 0xc1 selBase 0x186/1 PGC 1/1 SMM 1 drain 1 flush 0

2015-03-26T08:14:47.619Z| vmx| I120+ MonPmc:   gen counters num: 4 width 48 write width 32

2015-03-26T08:14:47.619Z| vmx| I120+ MonPmc:   fix counters num: 3 width 48

2015-03-26T08:14:47.619Z| vmx| I120+ MonPmc:   unavailable counters: 0x600000000

2015-03-26T08:14:47.620Z| vmx| I120: CPT: Restoring checkpoint /vmfs/volumes/21fbd13f-cf16dd67/session-295fb5c4.vmss

2015-03-26T08:14:47.624Z| vmx| I120: DUMPER: Restoring checkpoint version 8.

2015-03-26T08:14:47.635Z| vmx| I120: guestCpuFeatures = 0x2000fd

2015-03-26T08:14:47.635Z| vmx| I120: Msg_Question:

2015-03-26T08:14:47.635Z| vmx| I120: [msg.checkpoint.cpucheck.fail.feature] The features supported by the processors in this machine are different from the features supported by the processors in the machine on which the virtual machine state was saved.

2015-03-26T08:14:47.635Z| vmx| I120: [msg.checkpoint.cpucheck.fail.hard] Resume on a machine with similar processors.

2015-03-26T08:14:47.635Z| vmx| I120: [msg.checkpoint.restore.cpufail] An error occurred while restoring the CPU state from file "/vmfs/volumes/21fbd13f-cf16dd67/session-295fb5c4.vmss".

2015-03-26T08:14:47.635Z| vmx| I120: [msg.checkpoint.resume.softError] Your virtual machine did not resume because of a correctable error. Preserve the suspended state and correct the error, or discard the suspended state.

Based on my understanding (Westmere is a subset of both Sandybridge and Haswell), it should have worked.

Thanks,

Matt

0 Kudos
1 Solution

Accepted Solutions
admin
Immortal
Immortal
Jump to solution

MattPietrek wrote:

Thanks Jim - this was incredibly helpful. Exactly the details I need to make informed choices about how to proceed.

A follow up question, if I may: When a VM fails to resume because of CPU compat, is there some way of knowing exactly what features the .vmss requires that the host can't satisfy?

If you enable debugging for the VM, you should see something like the following in the vmware.log file:

2015-03-27T16:04:57.196-07:00| vmx| I120: FeatureCompat: Capabilities:

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.6 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.Intel = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RDRAND = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PDPE1GB = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XSAVE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.0 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.LM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.7 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.ENFSTRG = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.MWAIT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.MOVBE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.numGenCtrs = 0x8

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixedWidth = 0x30

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.4 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.VMX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.ABM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genWidth = 0x30

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.version = 0x3

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XCR0_MASTER_YMM_H = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.5 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: misc.cpuidFaulting = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PCID = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.FMA = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AVX2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XSAVEOPT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.microarchitecture.haswell = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RTM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSSE3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.NX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE41 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.BMI2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AES = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PCLMULQDQ = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SS = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.POPCNT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AVX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vt.realmode = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.F16C = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.FSGSBASE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.DS = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.0 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RDTSCP = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.LAHF64 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: hv.capable = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.CMPXCHG16B = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.INVPCID = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SMEP = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.numFixedCtrs = 0x3

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE42 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.BMI1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.HLE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XCR0_MASTER_SSE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: FeatureCompat: Requirements:

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE3 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PCLMULQDQ - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSSE3 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.FMA - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.CMPXCHG16B - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PCID - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE41 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE42 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.MOVBE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.POPCNT - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AES - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XSAVE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AVX - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.F16C - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.RDRAND - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.DS - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SS - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.FSGSBASE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.BMI1 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AVX2 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SMEP - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.BMI2 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.ENFSTRG - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.INVPCID - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XCR0_MASTER_SSE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XCR0_MASTER_YMM_H - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XSAVEOPT - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.LAHF64 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.ABM - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.NX - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PDPE1GB - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.RDTSCP - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.LM - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.Intel - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: misc.cpuidFaulting - Bool:Min:1

Anything in the requirements list that's not in the capabilities list is a problem.

Here's the mask for this iteration:

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.0.eax = 00000000000000000000000000001011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.eax = 00000000000000100000011001010001

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.ecx = 0000001010011000001000100011-011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.edx = -0001111111010111111101111111111

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.ecx = 00000000000000000000000000000001

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.edx = 00101000000100000000100000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.eax = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.ecx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.edx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.eax = 0110:0101:0111:0100:0110:1110:0100:1001

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ebx = 0010:1001:0101:0010:0010:1000:0110:1100

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ecx = 0110:1111:0110:0101:0101:1000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.edx = 0010:1001:0101:0010:0010:1000:0110:1110

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.eax = 0101:0101:0101:0000:0100:0011:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ebx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ecx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.edx = 0101:1000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.eax = 0011:0000:0011:0101:0011:0110:0011:0101

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ebx = 0010:0000:0100:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ecx = 0011:0111:0011:0110:0010:1110:0011:0010

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.edx = 0000:0000:0111:1010:0100:1000:0100:0111

I promise to experiment with the approach you suggested, only masking off post-Westmere and using '-' for everything else. But for now, just need to understand what's happening with our current masking.

Thanks!

Matt

I suspect  that you need to mask off leaf 7, as I mentioned towards the end of my last posting.

View solution in original post

0 Kudos
11 Replies
admin
Immortal
Immortal
Jump to solution

The Westmere masks that you refer to define host capabilities, not guest capabilities.  In general, host capabilities are a superset of guest capabilities.

If you replace all of the 1's in your masks with -'s, I think you'll be okay.

0 Kudos
MattPietrek
Contributor
Contributor
Jump to solution

Hey Jim,

I'm a little confused by your reply, and perhaps I'm misunderstanding something fundamental.

Here's exactly what we're putting in the .vmx file:

cpuid.1.eax = "00000000000000100000011001010001"

cpuid.1.ecx = "00000010100110001110001000111111"

cpuid.1.edx = "10001111111010111111101111111111"

cpuid.80000001.ecx = "00000000000000000000000000000001"

cpuid.80000001.edx = "00101000000100000000100000000000"

cpuid.d.eax = "00000000000000000000000000000000"

cpuid.d.ecx = "00000000000000000000000000000000"

cpuid.d.edx = "00000000000000000000000000000000"

Now, in a prior thread (https://communities.vmware.com/thread/503236), you replied this:

> There is a big difference between the cpuid options and the cpuidMask options.  The cpuid options are used to modify guest requirements and the cpuidMask options are used to modify host capabilities.  They are not interchangeable.  That, I believe, is the crux of your problem.

If that's correct, then I think we are specifying the guest requirements. And, empirically, by altering these masks in specified manner. we've successfully been able to suspend/resume/migrate across all Westmere/Sandybridge. If we don't specify mask in this way, suspend/resume/migrates don't work.

Matt

0 Kudos
admin
Immortal
Immortal
Jump to solution

MattPietrek wrote:

Hey Jim,

I'm a little confused by your reply, and perhaps I'm misunderstanding something fundamental.

Here's exactly what we're putting in the .vmx file:

cpuid.1.eax = "00000000000000100000011001010001"

cpuid.1.ecx = "00000010100110001110001000111111"

cpuid.1.edx = "10001111111010111111101111111111"

cpuid.80000001.ecx = "00000000000000000000000000000001"

cpuid.80000001.edx = "00101000000100000000100000000000"

cpuid.d.eax = "00000000000000000000000000000000"

cpuid.d.ecx = "00000000000000000000000000000000"

cpuid.d.edx = "00000000000000000000000000000000"

Now, in a prior thread (https://communities.vmware.com/thread/503236), you replied this:

> There is a big difference between the cpuid options and the cpuidMask options.  The cpuid options are used to modify guest requirements and the cpuidMask options are used to modify host capabilities.  They are not interchangeable.  That, I believe, is the crux of your problem.

If that's correct, then I think we are specifying the guest requirements. And, empirically, by altering these masks in specified manner. we've successfully been able to suspend/resume/migrate across all Westmere/Sandybridge. If we don't specify mask in this way, suspend/resume/migrates don't work.

Yes, you are, in fact, specifying guest requirements with the cpuid options.  However, the guest requirements that you are specifying cannot be met, because you are requiring features that we have never virtualized.

A physical Haswell CPU implements a set of features, H.  A physical Westmere CPU implements a different set of features, W.  Using cpuidMask options, you can make it look like your Haswell CPU implements W rather than H.

A VM started on a Haswell CPU implements only a subset of the available Haswell features, H'.  A VM started on a Westmere CPU implements only a subset of the available Westmere features, W'.  (A VM started on a Haswell system that is masquerading as a Westmere system also implements the feature set W'.)

You want your VMs to use only the W' features, even when started on a Haswell system, so that they can be warm-migrated to a Westmere system.  However, the cpuid options you are specifying are for the host feature set W, which is a superset of the usual guest feature set W'.

The derivation of W' from W depends on several factors, including virtual hardware version, guest OS type, and cpuid options. 

cpuid.1.eax specifies the family, model, and stepping of the processor.  For this option, your specification is fine.  It forces a valid Westmere family, model, and stepping.

cpuid.1.ecx is a 32-bit array indicating the availability of a variety of features.  Because of the 1's in your configuration option, you have specified that the following features are supported by the guest:

0 - SSE3

1 - PCLMULQDQ

2 - DTES64

3 - MONITOR

4 - DS-CPL

5 - VMX

9 - SSSE3

13 - CMPXCHG16B

14 - xTPR Update Control

15 - PDCM

19 - SSE4.1

20 - SSE4.2

23 - POPCNT

25 - AESNI

Of these, we do not virtualize DTES64, DS-CPL, xTPR Update Control, or PDCM.  We do virtualize MONITOR, but only by default for Mac OS X and ESXi guest OS types.  Due to the poor performance of virtualized MONITOR/MWAIT, it is not recommended to virtualize MONITOR for Linux guests, which will use it for processor scheduling.  We do virtualize VMX, but only when the option vhv.enable=TRUE is specified on an Intel host.

Because of the 0's in your configuration option, you have specified that the remaining features enumerated in cpuid.1.ecx are not supported by the guest.  That should not cause migration issues, but may be sub-optimal.  For example, we typically virtualize x2APIC on all hosts, regardless of whether or not x2APIC is supported by the physical CPU.  A guest OS that is cognizant of x2APIC can generally achieve better performance by using the feature.

I believe that what you actually want to specify is that the virtual CPU should not implement the following post-Westmere features: PCID (bit 17), MOVBE (bit 22), TSC-Deadline (bit 24), XSAVE (bit 26), AVX (bit 28), F16C (bit 29), RDRAND (bit 30).  For the rest of the features, you should let the normal derivation take place.  This suggests the following configuration option:

cpuid.1.ecx = "-000:-0-0:-0--:--0-:----:----:----:----"

In other words, you should force only the post-Westmere features to be off, and you should not force any features to be on.

I don't believe there are any post-Westmere features in cpuid.1.edx, cpuid.80000001.ecx or cpuid.80000001.edx, so you shouldn't need those options. 

Clearing out the 'd' leaf is reasonable, since all of the options in the 'd' leaf are post-Westmere.  An alternative is to use a mask for cpuid.0.eax to reduce the maximum input value for basic CPUID information.  I believe Westmere systems had a maximum input value of 0xb, suggesting the following option:

cpuid.0.eax = "0000:0000:0000:0000:0000:0000:0000:1011"

Another problem you might be running into is leaf 7.  I believe that all of leaf 7 is post-Westmere.  Unfortunately, leaf 7 falls under the basic input values supported by Westmere, so the cpuid.0.eax trick won't work.  To mask off the leaf 7 features introduced post-Westmere, you should use:

cpuid.7.eax = "0000:0000:0000:0000:0000:0000:0000:0000"

cpuid.7.ebx = "0000:0000:0000:0000:0000:0000:0000:0000"

cpuid.7.ecx = "0000:0000:0000:0000:0000:0000:0000:0000"

cpuid.7.edx = "0000:0000:0000:0000:0000:0000:0000:0000"

I hope that this helps to clarify things.

MattPietrek
Contributor
Contributor
Jump to solution

Thanks Jim - this was incredibly helpful. Exactly the details I need to make informed choices about how to proceed.

A follow up question, if I may: When a VM fails to resume because of CPU compat, is there some way of knowing exactly what features the .vmss requires that the host can't satisfy?

The error in vmware.log shows as:

2015-04-02T19:38:21.281Z| vmx| I120: guestCpuFeatures = 0x2002fd  # -->>>>> How can I interpret these?????

2015-04-02T19:38:21.281Z| vmx| I120: Msg_Question:

2015-04-02T19:38:21.281Z| vmx| I120: [msg.checkpoint.cpucheck.fail.feature] The features supported by the processors in this machine are different from the features supported by the processors in the machine on which the virtual machine state was saved.

2015-04-02T19:38:21.281Z| vmx| I120: [msg.checkpoint.cpucheck.fail.hard] Resume on a machine with similar processors.

2015-04-02T19:38:21.281Z| vmx| I120: [msg.checkpoint.restore.cpufail] An error occurred while restoring the CPU state from file "/vmfs/volumes/7379c234-ed540caf/session-07cc2548.vmss".

2015-04-02T19:38:21.281Z| vmx| I120: [msg.checkpoint.resume.softError] Your virtual machine did not resume because of a correctable error. Preserve the suspended state and correct the error, or discard the suspended state.

2015-04-02T19:38:21.281Z| vmx| I120: ----------------------------------------

2015-04-02T19:38:21.733Z| vmx| I120: MsgQuestion: msg.checkpoint.resume.softError reply=0

2015-04-02T19:38:21.733Z| vmx| I120: Module CheckpointLate power on failed.

Earlier in the file, I see:

2015-04-02T19:38:20.865Z| vmx| I120: hostCpuFeatures = 0xc6000fd   # (Not sure if this is relevant, and nor sure how to interpret those bits, but grasping at straws).


This was suspended on Haswell, resuming on Westmere.

As for the mask in play for the above, prior to getting your reply, I had been playing with a more restrictive mask that:

Set DTES64, xTPR, PDCM to '0'

Set MWAIT to '-'

Set PBE to '-'

Here's the mask for this iteration:

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.0.eax = 00000000000000000000000000001011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.eax = 00000000000000100000011001010001

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.ecx = 0000001010011000001000100011-011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.edx = -0001111111010111111101111111111

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.ecx = 00000000000000000000000000000001

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.edx = 00101000000100000000100000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.eax = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.ecx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.edx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.eax = 0110:0101:0111:0100:0110:1110:0100:1001

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ebx = 0010:1001:0101:0010:0010:1000:0110:1100

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ecx = 0110:1111:0110:0101:0101:1000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.edx = 0010:1001:0101:0010:0010:1000:0110:1110

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.eax = 0101:0101:0101:0000:0100:0011:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ebx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ecx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.edx = 0101:1000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.eax = 0011:0000:0011:0101:0011:0110:0011:0101

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ebx = 0010:0000:0100:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ecx = 0011:0111:0011:0110:0010:1110:0011:0010

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.edx = 0000:0000:0111:1010:0100:1000:0100:0111

I promise to experiment with the approach you suggested, only masking off post-Westmere and using '-' for everything else. But for now, just need to understand what's happening with our current masking.

Thanks!

Matt

0 Kudos
admin
Immortal
Immortal
Jump to solution

MattPietrek wrote:

Thanks Jim - this was incredibly helpful. Exactly the details I need to make informed choices about how to proceed.

A follow up question, if I may: When a VM fails to resume because of CPU compat, is there some way of knowing exactly what features the .vmss requires that the host can't satisfy?

If you enable debugging for the VM, you should see something like the following in the vmware.log file:

2015-03-27T16:04:57.196-07:00| vmx| I120: FeatureCompat: Capabilities:

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.6 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.Intel = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RDRAND = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PDPE1GB = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XSAVE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.0 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.LM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.7 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.ENFSTRG = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.MWAIT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.MOVBE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.numGenCtrs = 0x8

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixedWidth = 0x30

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.4 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.VMX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.ABM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genWidth = 0x30

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.version = 0x3

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XCR0_MASTER_YMM_H = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.fixctr.2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.5 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: misc.cpuidFaulting = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PCID = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.FMA = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AVX2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XSAVEOPT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.microarchitecture.haswell = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RTM = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSSE3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.NX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE41 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.BMI2 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AES = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.PCLMULQDQ = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SS = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.POPCNT = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.AVX = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.3 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vt.realmode = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.F16C = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.FSGSBASE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.DS = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.0 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.RDTSCP = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.LAHF64 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: hv.capable = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.CMPXCHG16B = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.INVPCID = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SMEP = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.numFixedCtrs = 0x3

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: vpmc.genctr.1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.SSE42 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.BMI1 = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.HLE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: Capability Found: cpuid.XCR0_MASTER_SSE = 0x1

2015-03-27T16:04:57.196-07:00| vmx| I120: FeatureCompat: Requirements:

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE3 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PCLMULQDQ - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSSE3 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.FMA - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.CMPXCHG16B - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PCID - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE41 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SSE42 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.MOVBE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.POPCNT - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AES - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XSAVE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AVX - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.F16C - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.RDRAND - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.DS - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SS - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.FSGSBASE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.BMI1 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.AVX2 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.SMEP - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.BMI2 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.ENFSTRG - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.INVPCID - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XCR0_MASTER_SSE - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XCR0_MASTER_YMM_H - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.XSAVEOPT - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.LAHF64 - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.ABM - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.NX - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.PDPE1GB - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.RDTSCP - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.LM - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: cpuid.Intel - Bool:Min:1

2015-03-27T16:04:57.196-07:00| vmx| I120: VM Features Required: misc.cpuidFaulting - Bool:Min:1

Anything in the requirements list that's not in the capabilities list is a problem.

Here's the mask for this iteration:

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.0.eax = 00000000000000000000000000001011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.eax = 00000000000000100000011001010001

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.ecx = 0000001010011000001000100011-011

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.1.edx = -0001111111010111111101111111111

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.ecx = 00000000000000000000000000000001

2015-04-02T19:38:20.846Z| vmx| I120: DICT        cpuid.80000001.edx = 00101000000100000000100000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.eax = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.ecx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT               cpuid.d.edx = 00000000000000000000000000000000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.eax = 0110:0101:0111:0100:0110:1110:0100:1001

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ebx = 0010:1001:0101:0010:0010:1000:0110:1100

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.ecx = 0110:1111:0110:0101:0101:1000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000002.0.edx = 0010:1001:0101:0010:0010:1000:0110:1110

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.eax = 0101:0101:0101:0000:0100:0011:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ebx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.ecx = 0010:0000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000003.0.edx = 0101:1000:0010:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.eax = 0011:0000:0011:0101:0011:0110:0011:0101

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ebx = 0010:0000:0100:0000:0010:0000:0010:0000

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.ecx = 0011:0111:0011:0110:0010:1110:0011:0010

2015-04-02T19:38:20.846Z| vmx| I120: DICT      cpuid.80000004.0.edx = 0000:0000:0111:1010:0100:1000:0100:0111

I promise to experiment with the approach you suggested, only masking off post-Westmere and using '-' for everything else. But for now, just need to understand what's happening with our current masking.

Thanks!

Matt

I suspect  that you need to mask off leaf 7, as I mentioned towards the end of my last posting.

0 Kudos
admin
Immortal
Immortal
Jump to solution

Actually, the feature compatibility information should be in the log file even without enabling debugging.

0 Kudos
MattPietrek
Contributor
Contributor
Jump to solution

> Actually, the feature compatibility information should be in the log file even without enabling debugging.

Ah... I'd seen the extra information for some VMs, but not others. And when I did have it, it helped me figure out we had a host with AES disabled in the BIOS. Perhaps the "feature compat" default settings changed between ESXi 5.5 versions?

In any event, is there some way to force "debug" output via a .VMX or host setting? Google's not much help here.

Setting it via the UI doesn't work for us because our VMs aren't persistent on a given host. Our orchestrator deploys/runs/stops/undeploys without human intervention.

Thanks again - you've been most helpful!

Matt

0 Kudos
admin
Immortal
Immortal
Jump to solution

The feature compatibility information is new with virtual hardware version 9.  You are probably not seeing it on some VMs because they are HWv8 or older.

0 Kudos
admin
Immortal
Immortal
Jump to solution

MattPietrek wrote:

In any event, is there some way to force "debug" output via a .VMX or host setting? Google's not much help here.

Probably not relevant to this particular issue, but this is the .vmx setting to force debug output:

vmx.buildType = debug

0 Kudos
MattPietrek
Contributor
Contributor
Jump to solution

Well, this is odd. Regarding debug output, e.g. "Capability Found:" and "VM Features Required" after many tests on the same host, appears I only

get that output when powering on a VM from a powered-off state.


If I resume the VM from suspend, I don't get those lines. Incredibly frustrating, as that's the situation where I most need to know what compat checks ESXi is doing.


Any other insights?


Thanks again,


Matt

0 Kudos
admin
Immortal
Immortal
Jump to solution

Resuming from suspend is where the debug build type should help.  It should tell you the requirements in the checkpoint file, at least.

0 Kudos