I have a 4-node ESX cluster with HA/DRS enabled. Although all guest appear to be functioning without problems, there are multiple event entries in VC console saying guest machines are trying to fail over to other hosts but fail - no detailed description given in VC event panel. These failures happen at least 1-2 times a minute.
Anyone have any clue where to start? I've seen posts where creating an entirely new HA cluster group and migrating all Host machines to that cluster sometime fixes the problem, but I'd like to get a bit more info before I go down that path.
Thanks
You didn't try to do a san re-scan thru the VI console did you? With 4 gig hba's.
Whoa, that link scares me. I haven't really seen that behavior, but I'll pay more attention as we add new LUNS. I didn't see that VMWare has yet released the patch for that KB article. Have they?
BTW - I created a new cluster and split the hosts / VMs between them. Everything looks good so far. I'll let the config 'marinate' for a day or so and then start moving the other hosts back.
-N
BTW - I have the EXACT config mentioned in many of the posts - DL360G5 with dual 4GB Emulex cards. Everything is patched to the hilt - including latest firmware for the servers / hbas, and patches for ESX.
I really cannot disable USB because ILO is our only remote mechanism into the svc console.
The fix will be included in 3.0.2. In the meantime make sure to scan from the console.
esxcfg-rescan vmhba1
