High Availability intermittent error : timeout while communicating with HA agent
I have a cluster of 4 servers running ESX 3.5, but managed by a VC running version 4.0.
I have a reoccuring problem where one of the servers will report the error message listed in the Subject of this post - "timeout while communicating with HA agent".
I've checked name resolution and all the ESX servers can resolve the short hostname of each other without any problems. I have noticed, however, that none of the servers can resolve the VC shortname (but they can ping the FQHN of the VC).
Is this my problem? I can update my DNS easily, but as it's an intermittent problem, I would appreciate if someone could confirm that I'm on the right track. Is resolution of the VC shortname necessary for HA to work? If so, why are none of my other ESX server reporting an error?