Hey guys, need some help with an upgrade. I have 2-Dell 2850s in a cluster and 2-Dell 6950s in another cluster. I upgraded the 2850s, both the BIOS and the ESX the other day without incident. Everything working fine. I did not have ANY vms on the 2850s when I upgraded to 3.5. Once I got the upgrades done and the networking set I moved a test vm over from the 6950s and everything is working OK. Today I started on the 6950s with upgrading the BIOS and the ESX to 3.5 from fully patched 3.0.2 and VC running 2.5 of course. Things did not go well. First off when I did the "upgrade" using Upgrade instead of fresh install the kernel would not boot, something about not being able to find the correct boot partition(I followed the same scenario as with the 2850s). OK, so I try it again - same thing, can't find the boot partition. OK, so on to a fresh install. Got that done and up. Now in a cluster I have one ESX host with 3.0.2 fully patched and one ESX host with 3.5 fully patched.
OK, now the problem: Between the two 6950s I can only VMotion vms that are powered down. When I try to Vmotion any vm that is powered on I get an error message that says that the CPU of the vm has to be reconfigured. I have this problem with Vmotion between the 2850s and the 6950s because the CPUs are of different families, and I have always had to do "cold" migrations. I never had to do a cold migration between the two 2850s or the two 6950s, they always Vmotioned the vms "hot".
Anyone got any ideas? I am hoping I don't have to shutdown each vm on the 3.0.2-6950 to get it over to the 3.5-6950. That would be a huge problem as all of our critical servers are running on the 3.0.2 host at this moment, the 2007 Exchange, 2005 SQL, etc., anyway you can picture the problem and the scenario, late night, wee hours. I am sure you have all been there.
Thanks for the reply Grasshopper but the thread you gave did not have the answer as far as I can see. My licensing is good for Vmotion, and like I said I can migrate with the vm powered down. I noticed that someone said something about the licensing for Vmotion was good for 3.0 but you needed to buy an upgrade when moving to 3.5. I don't think this would apply in my situation since I have and have had all along Platinum SnS.
Got any other good ideas?
Does anyone know if this issue is specific to Dells or AMD Processors? If maybe all of us having this problem can post Server makes as well as processors bios revisions to try and isolate? I will post specs tomorrow hopefully a solution can be found. The only solution I have been able to come up with thus far is to cold migrate the VM to the 3.5 ESX server power it on and then it will migrate back and forth between 3.0.1 and 3.5 without problems. It looks like it has to do with VC 2.5 trying to upgrade the VM's virtual hardeware prior to migrating it to the 3.5 server. When the machine is on the 3.5 host it is able to do the upgrade before it powers on but seems to fail when trying to do it live before the vmotion.
I have rectified my problem but it took a long time into the wee hours of the night. I was on with three different Tech Support people, very good tech support people I would like to say, but in the end they couldn't rectify it. Or, maybe they could have but it would have taken many more hours. I have nothing but good things to say about VMware Support Techs. All the techs I have dealt with have been excellent.
Anyway, what I ended up doing was shutting down each vm and cold migrating from the 6950 host with the 302 sw to the 6950 with 3.5 sw. Now what I did that has some relation to what others have said:
- I did upgrade the BIOS(Gene H mentioned this) on the Dell 6950 that I was trying to migrate to at the same time as I upgraded the ESX sw to 3.5 from 302. Maybe this is why the vm would not migrate at all from 302 to 3.5 host. Don't know, but if I had it to do over again I would upgrade to 3.5 first, migrate my vms, upgrade the other hosts in the cluster to 3.5, and THEN upgrade my BIOS. This would be a whole lot of migrations back and forth but maybe it would have worked.
-the CPU error, well that goes away after the first migration so I don't see that as anything but a fluke, a false error, or just plain annoying. It doesn't stop the migration or do anything to the vm, just causes you anxiety at a time when you definitely don't need anymore.
-once I cold migrated a vm from the 302 to the 3.5 it still would not Vmotion "hot" back to the 302 host. I would have to shut it down to migrate.
-I did not try upgrading the VMware Tools in each vm before trying to migrate
Now to answer vmmeup. I have two Dell 2850s in a cluster. These have Intel cpu's with 12 gig of RAM. I also have two Dell 6950s in another cluster. These have AMD cpu's with 32 gig of RAM. I have always had to cold migrate if I wanted to move a vm from a 6950 to a 2850 or vis-versa. All these hosts have performed excellently.
I have read all of these forums, as you all probably have too, that some have had this migration problem and others haven't. No one seems to have the definitve answer as to what caused my problem. I was just in the DSA class and there was one guy from a company that had a lot of hosts and he did not mention that they had any problems with the switch over. I also talked with a contractor about doing some other work and he also did not mention any problems with the upgrade. There is also not really that much here in the forums about it either. So I am going to assume that it has something to do with my particular setup or the way in which I did the upgrade. I suspect the fact that I did the the BIOS upgrade at the same time I did the upgrade to the 3.5 was the culprit but I will never know.
I do these upgrades regualry as I am a contractor for VMware and this is what I have seen in my experiance.
I have done many upgrades succefully without any issues when it comes to hot migratin vm's from 3.0.x to 3.5. In the environment that I am in now they are running Dell 6950's with the AMD 64 bit CPU's in them. When a vm is migrated from a 3.0.x host to 3.5 it does a hardware upgrade to the vm to support new feature available in 3.5. This is why you will see that CPU message. It looks to me like it is failing to do the hardware upgrade before it can vmotion the machine and this is why it is failing. I think it has something to do with the feature set of these AMD CPU's which have the Virtual Assist Enabled. So once a cold migration is performed the virtual machine's hardware is upgraded and it boots to no problem. From there it can be migrated between the hosts without any issues what so ever. You can actually see some of the additional hardware components that are added to the machine when they are on a 3.5 hosts. Refer to the screen shots attached. The screen shots are of the same machine. Once it has been upgraded it will only display the options related to the host version it is on. So if you migrate it to a 3.0.x host it no longer displays any of the new 3.5 options. So in the end depending on your CPU features you may have to cold migrate your VM's but after that you should be ok.
VM while on ESX 3.5 after cold migration then power on
VM while on a ESX 3.0.1 host after a vmotion from ESX 3.5
On my 6950s I upgraded the BIOS from 1.1.2 to 1.3.5. What BIOS version do you use?
Also I noticed that my Virtual Assist was "disabled" in the 1.1.2. I haven't taken time to check it after the BIOS update in 1.3.5 to see if it was "enabled" but then it would have had been part of the BIOS upgrade if it changed from "disabled" to "enabled". I think I will look at this on the one 6950 that I do not have any vms on yet. Are you turning on VA in the BIOS of all your machines with this capablility? I didn't know if there were any ramifications to turning it on yesterday and I had enough problems with the upgrade that I did not need anymore. Chicken? Yup.
Where did you get those screen shots from? Are you using a third party sw to get them or did you get them from VC? I see the VMware logo up at the top left of the screenshots. If VC client, where?
You shouldn't have any negative effects from using the Virtual Assist. You should see performance related improvements from ienabling it. I'm unsure of the bios version of there servers these are a customers servers and I am not on-site with them today due to the weather, but I can check on Monday. I took the screen shots using the alt - prtsc while in the VC client. On a VM choose "edit settings" click the options tab and you will see the screen that I posted.