VMware Cloud Community
AndreasHammargr
Contributor
Contributor

Not able to vMotion due to CPU missmatch

Hello!

I have two identical machines, both IBM x3650 M3 with two Intel Xeon X5650 CPU's. They have been in a cluster (without EVC Mode though) now for at least a couple of months. No DRS or HA is active.

Migrations between the two hosts worked fine just a week ago, but now something has changed. I get this message when attempting a migration:

Host CPU is incompatible with the virtual machine's requirements at CPUID level 0x1 register 'ecx'.

Host bits: 0000:0010:1001:1010:0010:0010:0000:0011

Required: x000:0x0x:10x1:1xx0:xx10:xx1x:xxxx:xx01

Mismatch detected for these features:

* General incompatibilities

It seems wierd that these machines are not compatible, as they are perfectly identical and bought at the same time, and as stated above, vMotion worked a few days ago. The only thing actually different between them is that one has 96GB ram and the other has 48GB ram.

I found a KB article that refered to a problem in vCenter 4.0 where you had to reset the CPUID mask to default while the machine was powered off, but this didnt help me.

I am by the way running vCenter 4.1u1.

Does anyone have any ideas?

I would appreciate a quick answer if possible!

thanks,

Andreas

0 Kudos
7 Replies
AndreTheGiant
Immortal
Immortal

Welcome to the community.

Have you make some changes on one host? Like BIOS update?

You can check on Cluster properties to see if EVC is enabled.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
AndreasHammargr
Contributor
Contributor

I checked BIOS versions just in case, and found that they are indeed the same. No changes made to one host.

EVC is disabled on the cluster.

I did however find a difference between the machines, both are running esxi 4.1.0 but one is release 260247, the other is 348481, which I suspect is 4.1 and 4.1u1.

But that shouldnt cause this problem, should it? It would make it impossible to raise the level of a cluster during production..

0 Kudos
DCjay
Enthusiast
Enthusiast

Hello,

Have you verfied that the Hosts Chip set supports Intel-VT as well as enabling them on the BIOS of both ESX servers.

Also ensure hyperthreading id either enabled on both or disbaled.

if that is the case try to cold migrate the guest to another ESX Host  and power on the Guest there, if that works vmotion the Guest back to  the original Host.

If no luck, try this article.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101129...

Jay

0 Kudos
AndreasHammargr
Contributor
Contributor

I ran the CPUID tool from VMware on each host, and they are almost completely equal. Im not sure I used it right, it probably dont want to run as a VM but rather on the physical machine. For instance, it states that Hyperthreading is disabled, however I have verified on the Configure\Processor-tab that it is enabled for both hosts, but of course this isnt relayed in to the virtual machine.

Looking at this however, the logical CPU reports different values for ID1ECX.

Host1:

ID1ECX: 0x82982203

ID1EDX: 0x0febfbff

ID81ECX: 0x00000001

ID81EDX: 0x28100000

Host 2:

ID1ECX: 0x80982201

ID1EDX: 0x0febfbff

ID81ECX: 0x00000001

ID81EDX: 0x28100000

And this is what my error message tells me, register ecx. The question is, how can these two identical machines differ? And why did it start to differ just a few days ago, maybe a week ago? One host has an update of 5 days, the other has 51 days. It was five days ago I inserted more memory in that host. Now I have imported a few VM's into the host with more memory, and want to move over the ones on the first host so I can upgrade that one too, but this is where it fails.

Regarding the Intel-VT part, is there a way to check this in vCenter? I am pretty sure it is turned on.

I did try to cold migrate a machine to another host and try to vMotion it back, but it didnt work. Same error.

0 Kudos
AndreasHammargr
Contributor
Contributor

After even more searching I found this article:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102978...

It explains exactly the problem I have, but it doesnt really tell me what is wrong. My guess is that one of the hosts has a faulty BIOS setting somewhere, but I'm pretty sure I havent set it (didnt even enter the bios setup when I upgraded the memory..).

I'll just have to quickly cold-migrate the machines over to the other host I guess.

0 Kudos
AndreasHammargr
Contributor
Contributor

This is what I did:

Shut down the VM's on one machine, and cold-migrated them to the other and booted them up. All VM's were now running on one host.

I then moved the one that didnt have VM's out of the cluster, and into a new cluster with EVC enabled. This worked fine.

I was now able to hot-migrate all VM's to that host, to the new cluster.

After the other host got free from all the VM's, I tried to move that into the new cluster, but it wouldnt let me. It had the wrong CPU instructions it seemed.

Rebooted it, same problem.

I then upgraded it to 4.1u1 (it was 4.1 native before) and then it worked to move it into the cluster. All is now OK!

I hope my experience can help others!

/Andreas

0 Kudos
benny_hauk
Enthusiast
Enthusiast

We had same issue with IBM hardware.  Finally discovered that IBM shipped newer servers with the Westmere's AES instructions enabled but shipped older servers with AES disabled (anything shipping with less than 1.10 UEFI).  AES isn't something you can enable/disable from the BIOS so it's not apparent what the incompatibility is.  Finally discovered that it seemed that something about the older blades made vCenter think they were more like Nahalem than Westmere.  The reason is because the older ones don't have AES enabled (a Westmere feature).

For a more complete description of the fix (disabling AES on the new server and links describing how to enable it on older ones) see this:

http://communities.vmware.com/thread/311821

Benny

Benny Hauk Systems Admin, VCP3/VCP4 LifeWay Chrstian Resources
0 Kudos