VMware Cloud Community
mcallistera
Contributor
Contributor

HOWTO vmotion From Dell 2850 to Dell 2950 (woodcrest)

We bought some new Dell 2950's with the woodcrest (5160 or 51xx) chipset to replace our "old" Dell 2850's. Our desire was to vmotion all our VMs from the old gear to the new gear as we replaced each server. Virtual Center dashed our hopes as we quickly ran into "CPU of destination host is incompatible" problems when attempting the VMotion.

As most people know, vmotion between CPU's with different instructions can be problematic and there are apparently a couple of different approaches to overcome the incompatibility.

The first approach: Tell virtual center to not test for CPU compatibility. This is NOT supported, and from what I've read VMWare support will be upset if you open a service request after doing this. But... According to this thread you simply put

<migrate>

<test>

<CpuCompatible>false</CpuCompatible>

</test>

</migrate>

into your vpxd.cfg file on your VirtualCenter 2.0 server then restart the service. This should keep VC from performing ANY CPU compatibility checks.

Or the second approach: Change the "CPU Identification Mask" for individual virtual machines so that they do not "require" the incompatible features. This method essentially disables the checks like the first approach, but with more precision. It masks out only the checks that are incompatible instead of disabling ALL the checks.

We didn't try option 1; but the second approach...

We were able to vmotion a VM from a Dell 2850 to a Dell 2950 (both running ESX3.0) live without any problems using the following procedure:

1) On the old CPU machine. For an individual VM. Goto "Edit Settings">"Options">"Advanced">"Advanced" and change the eax and ecx masks as follows:

EAX -


-


-


-


-


XXXX -


-


ECX -


-


-


-0-- 0-0- --0- -R0- -


2) Vmotion the VM to the newer machine (Dell 2950).

3) Go back to the masks and click the "Reset All to Default" button to remove the masks.

Please TEST this before you do it in your production environment.

We did a revolving upgrade of each of our 4 Dell 2850's running ESX 3.0 to four Dell 2950's. We used the masks above on 25 of our VM's as we did the upgrades. All the VM's worked flawlessly as we changed the masks and vmotioned them to the new hardware. We run a mix of Suse SLES9, RH4, and Win2003 standard on those VM's. Most were windows.

How did we come up with those masks?

This post provided the ecx mask:

And this post gave me the idea to change the EAX mask. The EAX was a little tricky as most threads on the subject don't discuss EAX, yet VC2 was saying that eax level 1 was the source of the CPU incompatibility. To come up with that mask I looked for the default Windows OS masks in /etc/vmware/hostd/env/vmconfigoption-esx-3.0.0.xml on the ESX host. The default cpuFeatureMask for win2k3 std on level 1 is: <eax>xxxx:HHHH:HHHH:xxxx:xxxx:HHHH:xxxx:xxxx</eax>.

X's mean the feature isn't used by the OS, H means allow the OS to see the feature. I simply changed the mask by replacing the x's with - (use the default) and the H's with X's. Why change the H's to X's? Because I was pretty sure that my OS's weren't using any of the fancy cpu features and I wanted to tell virtual center that they weren't used. I reasoned that if VC thinks the features aren't used by the guest, then compatibility wasn't an issue. VC believed my assertions and the OS's were apparently not actually using the features as everything was fine as we moved the VM's about.

What I find strange is that the CPU mask can be changed while the VM is running. I don't know if it is a bug or not, but I certainly hope that this feature stays the same in future VC releases. I'm fairly certain that the CPU features are NOT actually changed until after a VM is restarted so changing the mask really only tricked virtual center into allowing the VMotion. Once the VM's were moved to the woodcrest chipset, we reset the masks back to default so that they would pick up the right CPU mask and feature set for the new chips the next time they are restarted.

Anyway, hope this helps new 2950 owners out there, this procedure sure saved us a lot of heartache. Once we figured it out, we were able to upgrade all of our esx boxes to the woodcrest chipsets without a single VM outage. By doing it this way we took an additional risk that a VM might crash as it was moved to "incompatible" chips, but we saved the pain of coordinating outages on 25+ production systems all with different maintenance windows. It changed the upgrade time frame from weeks to about 16 business hours. Obviously, we tested different VM's with all our OS's and the masks above prior to moving production VM's so our confidence level was high enough to proceed. You should test too.

0 Kudos
10 Replies
TomHowarth
Leadership
Leadership

could one of the moderators score this post, this is a excelent piece of work

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
jamieorth
Expert
Expert

Do you think this would work in the Dell 6850 line? I don't believe that Dell has released any Woodcrest based 6850's, but they do have some Dual core servers... Checking the Vmotion Compatibility chart at http://www.dell.com/downloads/global/solutions/vmotion_compatiblity_matix.pdf it looks as if any 6850 can vmotion to any other 6850.

0 Kudos
mcallistera
Contributor
Contributor

I can't tell you for sure that it would work with other boxes, but I would think it likely.

I cannot stress enough that you are taking a risk when you engage these masks and migrate to different CPU types. We didn't experience any outage, but that doesn't guarantee it will work. I'm sure the VMWare folks' advice will be to power off then migrate.

That said, it isn't too hard to to give it a try. Smiley Happy

For us, the risk was low compared to the need to migrate quickly. We determined the risk was low after testing every OS we had in production.

0 Kudos
mstahl75
Virtuoso
Virtuoso

We have 2 4-way single-core 6850's and 1 2-way dual-core 6850's. When doing cold migrations, when all servers were seen by VC 2.0, the masks were created automatically. I have a couple of posts with the masks that were created on one of the VMs.

We haven't migrated all the VMs of the 2-way server yet due to needing to minimize down-time but once we do we will be heavily testing migrating back and forth, with an without the masks.

0 Kudos
MillardJK
Enthusiast
Enthusiast

Very nice work![/b] I have a similar situation (Dell 2850, 2950), but I'm looking to keep my existing 2850s in production. The mask you've provided is letting me VMotion, too, but I'm thinking that a reboot after applying the mask is potentially the best way to get (and keep) the guest stable in order to enable DRS among the different hosts.

——
Jim Millard
Kansas City, MO USA
0 Kudos
jjbakker
Contributor
Contributor

Anymore experiences already?

We have to expand the ESX enviroment of a customer who currently has 2 2850's (with NX/DX mask already). And prob. there will be a 3rd server (2950).

Is it save to order a 2950 with Dual core? or is it also possible to order a Quadcore?

Kind regards, JJBakker
0 Kudos
steve31783
Enthusiast
Enthusiast

Does anyone have any experience with running this setup permenately? I'd like to keep both 2850's and 2950's in the same cluster of my environment, rather than setting up 2....

0 Kudos
c_g-hills
Enthusiast
Enthusiast

I can confirm that this works to enable VMotion of 32bit guests between Dell PowerEdge 1855 and 1955 (with Intel E3510 CPU) servers.

0 Kudos
MattJoyce
Contributor
Contributor

Still a useful post.

Hiding NX worked for me between Intel 5140 Dual Core and Intel E5335 Quad Core (both 2950)

0 Kudos
qmcnetwork
Enthusiast
Enthusiast

This was a very good post, but unfortunately this technique stopped working. We're hoping to hot migrate ESX 2.5.x VMs to new hardware running ESX 3.5.0 Update 1. We get the same error that "eax level 1 is the source of the CPU incompatibility" when trying to migrate. But when trying to edit the CPU Mask on the VM it's greyed out. We've tried editing with the VM off and it's still greyed out.

We know it's not supported, but we're looking at 70+ VMs to move. Using the run-virtual.com VMotion Info tool we can see the differences between the Intel CPUs are NX and SSSE3 instructions.

Anyone aware of this feature being removed and any workarounds?

Jonathan

0 Kudos