VMware Cloud Community
cajx
Contributor
Contributor

vCenter reports no utilization on server, but VMs still run. Can't snapshot.

I have the distinct feeling rebooting the ESXi server will fix this, but right now we have seen/are seeing this:

1. vCenter reported it couldn't talk to blade 1 in our IBM blade center. But VMs never went down.

2.There are no active errors showing in vCenter for the blade.

3. Under the Virtual machines Tab, it shows Host CPU and Host mem and Guest Mem as zeros... it sees no activity.

4. Vmware snapshots (note, these are initiated by the n-series/netapp storage) fail with errors such as:

2009-11-20 00:15:25,830 WARN - VMware Task "CreateSnapshot_Task" for

entity "server1.domain.COM" failed with the following error: The

operation is not allowed in the current state.

2009-11-20 00:15:25,830

ERROR - VM "server1.domain.COM" will not be backed up since vmware snapshot create operation failed.

and

The operation is not allowed in the current state, under the Tasks and Events tab.

I'm just noting what we are seeing/doing. After we contact IBM support, we are just going to try to Vmotion the VMs off the weird blade and reboot it.

Our system is composed of IBM n-series storage (IBM branded NetApp), an IBM H chassis (BladeCenter-H), several blades HS22 (Type 7870), and Cisco ethernet switches.We use NFS for the storage. We still have a very very light load on this whole system because we keep seeing random bugs like this.

Are these just typical vCenter type bugs not to worry about? Reboot and you are fine type stuff, sort of like things you typically see in the windows world? Or something more sinister is lurking?

0 Kudos
4 Replies
ShaneWendel
Enthusiast
Enthusiast

Restarting the vCenter service or rebooting the vCenter server didn't fix this? It sounds like a problem communicating via vpxuser on the machine. Rebooting your vCenter won't affect running VMs on individual hosts in any way.

I'm assuming you don't have HA running or have configured the isolation response to be "Leave VMs Powered On"

-


Shane Wendel, VCP

----------------- Shane Wendel VCP: vSphere 4 VCP: VI3 http://fatalsync.wordpress.com
0 Kudos
cajx
Contributor
Contributor

We restarted the vCenter server but no luck.

We do have HA running, but it hasn't moved anything. Also, we tried to VMotion a test server to the weird blade host, and it failed. So now I'm afraid to try to VMotion something off b/c it's hung in the past and caused problems. We're going to wait till lunch when no users are active and then try VMotion.

EDIT: sorry i missed the isolation response part. We actually have HA set to "Host Isolation Reponse: Shut down". So from what I'm inferring, we should have expected it to shut all those VMs down then bring them back up somewhere else. But it didn't do it... and it does still see the VMs running. Not sure what to make of it.

0 Kudos
cajx
Contributor
Contributor

I forgot to mention that we talked to IBM support. They recommend we restart the management module on the blade before rebooting the whole blade to see if that does it. Since the blade isn't broken (i guess) and the management module must be the means for it to communicate with, eventually, vCenter, that is a logical step to take.

It should be noted that we have updated firmware on this management module and the blades very recently... i think we are completely up to date.

EDIT/UPDATE: Restarting the VMware Management Agents allows us to VMotion again... maybe coincidentally we changed the vSwitch gateway to a different IP (we aren't 100% sure how that setting works, our consultant directed us there... time to go read). This workaround corrected the problem on the two blades that started having problems. I suppose this means we need to look for a VMware update to fix this or maybe the old gateway IP we had was truly wrong.

Link to restarting mgt agent on ESX or ESXi:

0 Kudos
cobera
Contributor
Contributor

I ran into the same problem where when powering on VMs I get this error. Also vmotion, snapshots, vmtools etc. do not respond.

VMs are online (if they are powered on) Rebooting the host will not help.

Problem is with VC Server.

Vpxuser cannot perform the functions (talk to each independent host gathering and updating values.

You can prove this theory in your enviroment by simply connecting to just one of the hosts (with a vm not powered on and try to start it first within VCS and then client off of the host.) If you can power it on through the client (and not VCS) then reboot your VCS server.

Hope this helps....

0 Kudos