vCenter Disconnects from Master HA Agent

usulsuspct · ‎06-12-2012

After a recent vCenter migration we are seeing the following events in our cluster taks & events:

vCenter Server is connected to a master HA agent running on host ...

vCenter Server is disconnected from a master HA agent running on host ...

These two entries always appear together at just about the same time every 5 minutes. The server in question for each cluster is the master node.

vCenter Version / Build: 5.0.0 623373

Cluster Version / Build: ESX v4.1.0 (702113) / ESXi v5.0.0 (623860)

The clusters all appear to be functioning normally and the events are just info / warning - however it would be nice to get to the bottom of them. We recently migrated our vCenter instance to a new VM on a different subnet. We did go through and ensure that all of the hosts were properly disconnected / re-connected after the IP change.

I have started looking through the local logs on the master nodes in question and have found one entry that appears to align with the timing of the above events is:

vpxa.log:

2012-06-12T10:04:25.374Z [FFCBFAC0 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(local=127.0.0.1:0, peer=127.0.0.1:61618) with error N7Vmacore15SystemExceptionE(Connection reset by peer)

I have read through some KBs regarding issues with DNS servers etc - I have confirmed that all DNS servers are reachable etc (this looks like the localhost address anyways...nost sure what DNS would do with that.)

Anyone see this? -- More importantly, anyone resolve this!?

vmroyale · ‎06-12-2012

Hello.

Note: Discussion successfully moved from VMware ESXi 5 to Availability: HA & FT

beckham007fifa · ‎06-12-2012

which datastore did you select for the HA heartbeat? can you try changing the datastore and observe. Sometimes the storage connectivity also can such events to be generated.

Regards, ABFS

admin · ‎06-12-2012

Check the HA agent logs on the hosts for errors (/var/log/vmware/fdm/fdm*log)

Elisha

usulsuspct · ‎06-12-2012

I chose "Select any of the cluster datastores" as the datastore heartbeating option.

Should be noted that I am seeing this environment wide 5 production clusters and 2 test clusters (attached to a different vCenter instance.)

usulsuspct · ‎06-12-2012

The log snippet I posted above is actually present in both the fdm.log and the vpxa.log (at least it is the only one that seems easily relateable to the event):

2012-06-11T07:43:28.788Z [58A70B90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(local=127.0.0.1:0, peer=127.0.0.1:59492) with error N7Vmacore15SystemExceptionE(Connection reset by peer)

beckham007fifa · ‎06-12-2012

change the datastore for a check.

Regards, ABFS

depping · ‎06-15-2012

I would also suggest filing a support request if you have support on this environment.

Might also want to consider upgrading to ESXi 5.0 U1

usulsuspct · ‎06-15-2012

Unless I have the build numbers wrong - I believe both of those are 5.0.0 U1 level.

vCenter Version / Build: 5.0.0 623373

Cluster Version / Build: ESX v4.1.0 (702113) / ESXi v5.0.0 (623860)

http://www.vmware.com/support/vsphere5/doc/vsp_esxi50_u1_rel_notes.html

ESXi 5.0 Update 1 | 15 MAR 2012 | Build 623860

http://www.vmware.com/support/vsphere5/doc/vsp_vc50_u1_rel_notes.html

vCenter Server 5.0 Update 1 | 15 March 2012 | Build 623373

(Love the books BTW...)

sn4psh0t · ‎12-18-2012

Hi all,

we had exactly the same issue caused by a firewall between the vCenter and ESXi host. Our default connection keep-alive timeout on the firewall was set to 300 seconds. That's why we had this error every 5 Minutes. It seems as this connection (CIM on port 5989) has to stay open all the time. otherwise you will see the error every 5 minutes. Hopefully this hint will also solve your issues. Sorry for my english, I am not a native speaker....

regards

sn4psh0t

depping · ‎12-21-2012

sn4psh0t wrote:
Hi all,
we had exactly the same issue caused by a firewall between the vCenter and ESXi host. Our default connection keep-alive timeout on the firewall was set to 300 seconds. That's why we had this error every 5 Minutes. It seems as this connection (CIM on port 5989) has to stay open all the time. otherwise you will see the error every 5 minutes. Hopefully this hint will also solve your issues. Sorry for my english, I am not a native speaker....
regards
sn4psh0t

That is a very valid point. Nice one and thanks for adding it to this thread, and absolutely no need to apologize... most of us are not native speakers 🙂

sn4psh0t · ‎12-21-2012

by the way, we increased this mentioned value from 300 seconds to 6000 and the error went away. Maybe a lower value than 6000 is also adequate, but I couldn't find time to play around with this.

aakalan · ‎01-03-2014

I think it's a firewall issue. check this. additional note: check the switch and turn off anti d-dos.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&e...

All

vCenter Disconnects from Master HA Agent