Hi All,
I'm running 4.1U3. I have a cluster that has 8 hosts in it. 6 of them are E7-2860 CPUs. Two of them are E5-2680v2 CPUs (new). vCenter will allow me to enable EVC. However after enabled and I power a few VMs on on one of the new hosts it tells me "The cluster cannot be configured with the selected Enhanced vMotion Compatibility mode; CPU features disabled by that mode may currently be in use by powered-on or suspended virtual machines in the cluster."
Also, when I try and migrate any of those VMs from the new hosts to one of the E7-2860 hosts it won't pass validation saying "Host CPU is incompatible with the virtual machine's requirements at CPUID leval 0x1 register 'ecx'.
The only thing I can think of is the E5-2680v2 isnt supported on 4.1 and therefore EVC is not working properly. Any thoughts on this?
I tried adding that line, but there was no change.
I have found that if I change
cpuidMask.val.1.eax = "0x00020651" (the value set by evc)
to
cpuidMask.val.1.eax = "0x000106a4"
vMotion will validate.
I figured this out by creating a new cluster and setting the evc level to Nehalem. I put one of the new hosts in this cluster and observed what changed in /etc/vmware/config. I noticed that the following values were set in the new cluster.
cpuidMask.val.1.eax = "0x000106a4"
cpuidMask.val.1.ecx = "0x0098e23d"
So I put that host back into the original cluster and set both of those values and vmotion worked. Both vmotion from the new host to the other hosts and also from the original hosts to the new host worked.
So then I tried just setting the eax value and leaving the ecx value at the cluster default and that also worked. So I'm not sure why vCenter is reporting an issue with the ecx register when its eax that was the problem.
So at this point I think my options are create a new cluster with the new hosts until we upgrade to 5.5 or run with the modified eax value for now. The problem with running with the value changed is vCenter resets it if the host goes into maintenance mode or is restarted.
matt948 wrote:
The only thing I can think of is the E5-2680v2 isnt supported on 4.1 and therefore EVC is not working properly. Any thoughts on this?
That's correct. However, you can probably work around this by manually editing the EVC masks in /etc/vmware/config on each of the new systems to force cpuid.1.ecx bits 29 and 30 to 0.
Thanks jmattson,
The original value was
cpuidMask.val.1.ecx = 0x0298e23f = 0000 0010 1001 1000 1110 0010 0011 1111
and I changed to
0x0298e233 = 0000 0010 1001 1000 1110 0010 0011 0011
Is that what you were intending?
This should take hold without a reboot right? I found when I rebooted the value was reset.
Actually, the initial value you've provided already should have bits 29 and 30 masked out. It may not be possible to mask out these bits under ESX 4.1. What is the complete contents of your /etc/vmware/config file?
~ # cat /etc/vmware/config
.encoding = "UTF-8"
libdir = "/usr/lib/vmware"
authd.proxy.vim = "vmware-hostd:hostd-vmdb"
authd.proxy.nfc = "vmware-hostd:ha-nfc"
authd.proxy.nfcssl = "vmware-hostd:ha-nfcssl"
vmauthd.logEnabled = "FALSE"
log.vmauthdFileName = "/var/log/vmware/authd.log"
authd.fullpath = "/sbin/authd"
authd.soapServer = "TRUE"
vmauthd.server.alwaysProxy = "TRUE"
vmx.fullpath = "/bin/vmx"
authd.proxy.vpxa-nfc = "vmware-vpxa:vpxa-nfc"
authd.proxy.vpxa-nfcssl = "vmware-vpxa:vpxa-nfcssl"
cpuidMask.val.1.eax = "0x00020651"
cpuidMask.mode.1.eax = "clobber"
cpuidMask.val.1.ecx = "0x0298e233"
cpuidMask.mode.1.ecx = "mask"
cpuidMask.val.1.edx = "0x8febfbff"
cpuidMask.mode.1.edx = "mask"
cpuidMask.val.80000001.ecx = "0x00000001"
cpuidMask.mode.80000001.ecx = "mask"
cpuidMask.val.80000001.edx = "0x28100800"
cpuidMask.mode.80000001.edx = "mask"
Try changing:
cpuidMask.val.1.ecx = "0x0298e233"
cpuidMask.mode.1.ecx = "mask"
to:
cpuidMask.val.1.ecx = "0x02982203"
cpuidMask.mode.1.ecx = "clobber"
If it doesn't survive a reboot, it may be that VC is forcing these values to the Westmere EVC defaults.
Still getting the same error on migration, looks like the values are getting reset when taking the host out of maintenance mode in the cluster so I changed them after that and still, no dice.
I appreciate your help. I'll play with it a bit tomorrow and test it outside of the cluster. We going to start moving to 5.5 in a month, so this won't be a concern then, I just need a bridge fix until that time. It might be making a separate cluster out of these two hosts for the time being.
Try adding this to /etc/vmware/config:
cpuid.1.ecx = "-00-:----:----:----:----:----:----:----"
I tried adding that line, but there was no change.
I have found that if I change
cpuidMask.val.1.eax = "0x00020651" (the value set by evc)
to
cpuidMask.val.1.eax = "0x000106a4"
vMotion will validate.
I figured this out by creating a new cluster and setting the evc level to Nehalem. I put one of the new hosts in this cluster and observed what changed in /etc/vmware/config. I noticed that the following values were set in the new cluster.
cpuidMask.val.1.eax = "0x000106a4"
cpuidMask.val.1.ecx = "0x0098e23d"
So I put that host back into the original cluster and set both of those values and vmotion worked. Both vmotion from the new host to the other hosts and also from the original hosts to the new host worked.
So then I tried just setting the eax value and leaving the ecx value at the cluster default and that also worked. So I'm not sure why vCenter is reporting an issue with the ecx register when its eax that was the problem.
So at this point I think my options are create a new cluster with the new hosts until we upgrade to 5.5 or run with the modified eax value for now. The problem with running with the value changed is vCenter resets it if the host goes into maintenance mode or is restarted.
Hi
Did you use VirtualCenter5 or VC4? Is there any solution to add an ESX 4.1 host with Ivy bridge to an existin VC4 cluster with Westmere CPU's?
Thanks
Fausto
We were running vCenter 5.0 but the host was running 4.1.
I was able to use an ivy bridge host in a westmere evc cluster using the workaround detailed above, but I wasn't comfortable running it in a production environment. So I chose not to.
I try your solution but on VC4 EVC can not be enabled. Maybe it can be done in the VC database?
Do you have VMs running on the cluster? You won't be able to enable EVC if there are any running VMs that are using instructions that would be masked by EVC. You have to shut down those VMs first.
If you don't have any running VMs then I don't know why it won't allow you to enable EVC.