VMware Cloud Community
nathanielely
Contributor
Contributor

HA Continuously trying to fail over guests but failing

I have a 4-node ESX cluster with HA/DRS enabled. Although all guest appear to be functioning without problems, there are multiple event entries in VC console saying guest machines are trying to fail over to other hosts but fail - no detailed description given in VC event panel. These failures happen at least 1-2 times a minute.

Anyone have any clue where to start? I've seen posts where creating an entirely new HA cluster group and migrating all Host machines to that cluster sometime fixes the problem, but I'd like to get a bit more info before I go down that path.

Thanks

Reply
0 Kudos
4 Replies
gdesmo
Enthusiast
Enthusiast

You didn't try to do a san re-scan thru the VI console did you? With 4 gig hba's.

http://www.vmware.com/community/thread.jspa?threadID=67309

Reply
0 Kudos
nathanielely
Contributor
Contributor

Whoa, that link scares me. I haven't really seen that behavior, but I'll pay more attention as we add new LUNS. I didn't see that VMWare has yet released the patch for that KB article. Have they?

BTW - I created a new cluster and split the hosts / VMs between them. Everything looks good so far. I'll let the config 'marinate' for a day or so and then start moving the other hosts back.

-N

Reply
0 Kudos
nathanielely
Contributor
Contributor

BTW - I have the EXACT config mentioned in many of the posts - DL360G5 with dual 4GB Emulex cards. Everything is patched to the hilt - including latest firmware for the servers / hbas, and patches for ESX.

I really cannot disable USB because ILO is our only remote mechanism into the svc console.

Reply
0 Kudos
gdesmo
Enthusiast
Enthusiast

The fix will be included in 3.0.2. In the meantime make sure to scan from the console.

esxcfg-rescan vmhba1