wildcattdw
Contributor
Contributor

ESX 3.5 - guest CPU after Vmotion

Hey all. After upgrading several of my ESX hosts (my development cluster and my production VDI cluster) I have been running into a pretty serious issue. I have not seen anyone else ask/complain about this, but from my testing, it seems to me somebody should be seeing something similar.

After upgrading, when a machine is Vmotioned between two hosts, and the Vmotion hits 90%, the guest CPU spikes to 100% and just cooks. I have to reboot the guest to clear the issue. My VDI enviroment is all Unisys ES7000/one hosts, my dev enviroment is a mixture of HP and IBM hosts, it happens in both environments. All hosts involved were upgraded from 3.0.2, most via esxupdate.

Anyone else experiencing anything similar? Ideas?

Tim

0 Kudos
25 Replies
dirckvdb
Contributor
Contributor

I am currently out of the office until 05-05-2008.

For email related issues please contact Fabian Mancilla: MancillaF@pginw.com

For other urgent matter please contact Harrie Bueters (hbueter@pginw.com) or Bart Hassing (bhassing@pginw.com)

0 Kudos
kimono
Expert
Expert

the KB article says that Update 1 fixes it, but it does not:

Did any one get confirmation Update 1 fixes this? I am wondering to what extent it means "Update 1 fixes it". I.E. does it change the mem value to 5, so that change is visible to us in the VI Client, or is it something fixed behind the scenes. Or does Update 1 upgrade fail in some way, and only fixes it in a new installation.

I need some confidence on Update 1 and the KB article before messing with production.

/kimono/

/kimono/
0 Kudos
kimono
Expert
Expert

I may have answered my own question. I ran the update workaround in our test environment, and can see the MEM parameter is populated with the new value in every host, eventually.

It would be good to get clarification why Update 1 didn't fix it when the KB article says so.

/kimono/

/kimono/
0 Kudos
kimono
Expert
Expert

The response I got from SR:

"Please note that as you have mentioned, this issue reported in the KB is

supposed to be fixed for VC 2.5 U1,and that is correct but since there are

multiple reasons why this issue has occurred and our engineering team is

working on resolutions for all possible causes for this issue it is not

completely fixed.

At present the issue is not fixed for all the scenarios,so unfortunately we

need to add that parameter manually as VC 2.5. Our engineering team is

currently working on same and it may be fixed in our next update/release of

the product."

So KB article still works for some of the scenarios.

/kimono/

/kimono/
0 Kudos
gdesmo
Enthusiast
Enthusiast

Thanks for the info. That answers my question. I have moved some of my hosts to update1. I was wondering if I need to revert the change I made from the KB.

So I will leave it in place even after all of my hosts are at update1.

0 Kudos
AdamSnow
Enthusiast
Enthusiast

We saw this issue ever since the day we installed 3.5. I believe we were one of the first, if not the first, to report it to VMWare. Anyway, we upgraded to Update 1, and the issue is still there just as bad. We then applied the workaround from the KB article, and everything is fine now. In fact 3.5 has been more stable for us than 3.0.2 lately, so I think we are going to move forward with deploying it everywhere.

0 Kudos