I have a cluster of 3 ESX 3.0.1 hosts that have HA enabled. I am not using DRS at all. What has happened twice now is when I do a server mgmt-vmware restart on any of the hosts, all the VM's on that host power down.
The first time this happened a Vmware tech had me run this command which I did which caused all the VM's to power off. When I asked him why this happened he suggested putting the server in maintenance mode beforehand.
The second time this happened is when I was applying VC patch2. I went into Virtual Center, edited my cluster settings and unchecked enable HA. VC then reported a Unconfigure HA task for each host which completed successfully. While upgrading VC one of the ESX hosts did not complete the agent install so I did a server mgmt-vmware restart as suggested in another thread and low and behold all the VM's powered down again.
When I checked the hostd logs on the ESX server I saw no log entry that would indicate HA was disabled at the time the task completed in VC to disable it. The logs after that point seemed to indicate HA was still enabled.
The vmware tech came back with "Since this host was at one point in time a member of the HA cluster and it was not put into maintenance mode prior to restarting those services, it may have determined it was isolated and began to power down the vm's."
So whats the point of disabling HA if it's still going to think it's enabled? I also posted to see if it was a better idea to put the server in maintenance mode rather then disable HA and the response I received was definitely don't put the server in maintenance mode.
Here's the thread I reference about disabling HA before applying the patch...
http://www.vmware.com/community/thread.jspa?threadID=74479&messageID=592814#592814
To me this seems like a flaw in product, at this point it makes me nervous to do any work on the server because how flaky HA seems.
Anyone else see this behavior or have any suggestions?
The first time this happened a Vmware tech had me run this command which I did which caused all the VM's to power off. When I asked him why this happened he suggested putting the server in maintenance mode beforehand.
The second time this happened is when I was applying VC patch2. I went into Virtual Center, edited my cluster settings and unchecked enable HA. VC then reported a Unconfigure HA task for each host which completed successfully. While upgrading VC one of the ESX hosts did not complete the agent install so I did a server mgmt-vmware restart as suggested in another thread and low and behold all the VM's powered down again.
When I checked the hostd logs on the ESX server I saw no log entry that would indicate HA was disabled at the time the task completed in VC to disable it. The logs after that point seemed to indicate HA was still enabled.
The vmware tech came back with "Since this host was at one point in time a member of the HA cluster and it was not put into maintenance mode prior to restarting those services, it may have determined it was isolated and began to power down the vm's."
So whats the point of disabling HA if it's still going to think it's enabled? I also posted to see if it was a better idea to put the server in maintenance mode rather then disable HA and the response I received was definitely don't put the server in maintenance mode.
Here's the thread I reference about disabling HA before applying the patch...
http://www.vmware.com/community/thread.jspa?threadID=74479&messageID=592814#592814
To me this seems like a flaw in product, at this point it makes me nervous to do any work on the server because how flaky HA seems.
Anyone else see this behavior or have any suggestions?