Does anyone else out there uses Microsoft SCOM 2007 on there ESX VM clients? If so doesn't your VM clients cpu goes to 100% and stays there forever after the VM client is vmotion to another ESX host?
I'm sure there is some kind of incompatible with SCOM and Vmotion because after a client Vmotion to another ESX host SCOM goes crazy and starts running hundreds of cscripts. VM clients without the SCOM alert has no problem.
Just want to see if it's just our installation of SCOM or not.
We are getting the same behavior of 100% CPU usage after vmotion of a VM in a HA/DRS cluster. (We have 3 ESX3.5 hosts in the cluster.) However, the client VM does not have MS SCOM. It does have MS Network Load Balancing enabled. Surprisingly, a test NLB VM we vmotioned did not exhibit this behavior, but when we vmotioned a production NLB VM it went to 100% CPU usage. A guest OS reboot did not return it to normal operation. It took a complete VM shutdown and restart to get it working normally again.
I would welcome any insights anyone might have into this issue. I will also generate a support case about it.
Jay
We don't have NLB enabled on any of the guest OS. We are also on ESX 3.5 and I don't recall this issue when we were on ESX 3.0.2. I'm wondering if it's a bug in ESX 3.5. And you are also correct about having to actually shutdown and restart the guest VM's. This is causing a problem with us because if any of our guest OS gets vmotion it will start eating up the resources on that ESX host and then DRS will start to vmotion one or more guest OS from that host to another host and this problem just starts to snowball itself to the point we have to shutdown the VMs and restart them all.
I think I may have found an explanation and solution for the 100% CPU usage. Please search the VMware knowlegebase for article # 1003638. I have implemented the work-around described on our VC server, but I haven't tested it yet. We are doing some host upgrades and when those are done we will be migrating a bunch of VM's. That will give us an opportunity to test. I will report back the results after our next round of migrations.
Jay
I applied the fix in KB # 1003638 directly to the vpxd.cfg file as VI interface method was not taking. I've been able to vmotion my VMs back and forth with no more high cpu usage issues anymore. It's kinda weird that the SCOM process was the one sucking up all the cpu usage.
Thanks Jay.
You're welcome, fgl.
I'm pleased to hear this worked for you. I haven't done any migrations since applying the fix to VC server. Your success gives me more confidence that the fix will work for us too. It is nice to be able to post a solution and help someone else for a change. The VMware forums have been invaluable to me!
Jay
Jay thanks for pointing to that kb article. I've been plagued by this for the past week and couldn't put my figure on what exactly was causing the problem. The work around described in the kb article worked great for me.
There's also an issue with SCOM 2007 and the oem bit map. %systemroot%\system32\OEMLOGO.BMP. If you are using SCOM and the OEMLOGO.BMP exist, you will see spikes in CPU, Up and down. Also it looks like it runs a full scan at the time of the install everyday if I recall correctly. I didn't believe it either but renaming the OEMLOGO.BMP to OEMLOGO.bak resolved the problem. Microsoft was not very forthcoming on how a bit map could cause this problem.
This could be that this is an issue on physical machines but it's not as noticeable
-Richpo