VMware Cloud Community
usulsuspct
Contributor
Contributor

vCenter Disconnects from Master HA Agent

After a recent vCenter migration we are seeing the following events in our cluster taks & events:

vCenter Server is connected to a master HA agent running on host ...

vCenter Server is disconnected from a master HA agent running on host ...

These two entries always appear together at just about the same time every 5 minutes.  The server in question for each cluster is the master node.

vCenter Version / Build: 5.0.0 623373

Cluster Version / Build: ESX v4.1.0 (702113) / ESXi v5.0.0 (623860)

The clusters all appear to be functioning normally and the events are just info / warning - however it would be nice to get to the bottom of them.  We recently migrated our vCenter instance to a new VM on a different subnet.  We did go through and ensure that all of the hosts were properly disconnected / re-connected after the IP change.

I have started looking through the local logs on the master nodes in question and have found one entry that appears to align with the timing of the above events is:

vpxa.log:

2012-06-12T10:04:25.374Z [FFCBFAC0 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(local=127.0.0.1:0, peer=127.0.0.1:61618) with error N7Vmacore15SystemExceptionE(Connection reset by peer)

I have read through some KBs regarding issues with DNS servers etc - I have confirmed that all DNS servers are reachable etc (this looks like the localhost address anyways...nost sure what DNS would do with that.)

Anyone see this? -- More importantly, anyone resolve this!?

Reply
0 Kudos
12 Replies
vmroyale
Immortal
Immortal

Hello.

Note: Discussion successfully moved from VMware ESXi 5 to Availability: HA & FT

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
Reply
0 Kudos
beckham007fifa

which datastore did you select for the HA heartbeat? can you try changing the datastore and observe. Sometimes the storage connectivity also can such events to be generated.

Regards, ABFS
Reply
0 Kudos
admin
Immortal
Immortal

Check the HA agent logs on the hosts for errors (/var/log/vmware/fdm/fdm*log)

Elisha

Reply
0 Kudos
usulsuspct
Contributor
Contributor

I chose "Select any of the cluster datastores" as the datastore heartbeating option.

Should be noted that I am seeing this environment wide 5 production clusters and 2 test clusters (attached to a different vCenter instance.)

Reply
0 Kudos
usulsuspct
Contributor
Contributor

The log snippet I posted above is actually present in both the fdm.log and the vpxa.log (at least it is the only one that seems easily relateable to the event):

2012-06-11T07:43:28.788Z [58A70B90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(local=127.0.0.1:0, peer=127.0.0.1:59492) with error N7Vmacore15SystemExceptionE(Connection reset by peer)

Reply
0 Kudos
beckham007fifa

change the datastore for a check.

Regards, ABFS
Reply
0 Kudos
depping
Leadership
Leadership

I would also suggest filing a support request if you have support on this environment.

Might also want to consider upgrading to ESXi 5.0 U1

Reply
0 Kudos
usulsuspct
Contributor
Contributor

Unless I have the build numbers wrong - I believe both of those are 5.0.0 U1 level.

vCenter Version / Build: 5.0.0 623373

Cluster Version / Build: ESX v4.1.0 (702113) / ESXi v5.0.0 (623860)

http://www.vmware.com/support/vsphere5/doc/vsp_esxi50_u1_rel_notes.html

ESXi 5.0 Update 1 | 15 MAR 2012 | Build 623860

http://www.vmware.com/support/vsphere5/doc/vsp_vc50_u1_rel_notes.html

vCenter Server 5.0 Update 1 | 15 March 2012 | Build 623373

(Love the books BTW...)

Reply
0 Kudos
sn4psh0t
Contributor
Contributor

Hi all,

we had exactly the same issue caused by a firewall between the vCenter and ESXi host. Our default connection keep-alive timeout on the firewall was set to 300 seconds. That's why we had this error every 5 Minutes. It seems as this connection (CIM on port 5989) has to stay open all the time. otherwise you will see the error every 5 minutes. Hopefully this hint will also solve your issues. Sorry for my english, I am not a native speaker....

regards

sn4psh0t

Reply
0 Kudos
depping
Leadership
Leadership

sn4psh0t wrote:

Hi all,

we had exactly the same issue caused by a firewall between the vCenter and ESXi host. Our default connection keep-alive timeout on the firewall was set to 300 seconds. That's why we had this error every 5 Minutes. It seems as this connection (CIM on port 5989) has to stay open all the time. otherwise you will see the error every 5 minutes. Hopefully this hint will also solve your issues. Sorry for my english, I am not a native speaker....

regards

sn4psh0t

That is a very valid point. Nice one and thanks for adding it to this thread, and absolutely no need to apologize... most of us are not native speakers 🙂

Reply
0 Kudos
sn4psh0t
Contributor
Contributor

by the way, we increased this mentioned value from 300 seconds to 6000 and the error went away. Maybe a lower value than 6000 is also adequate, but I couldn't find time to play around with this.

Reply
0 Kudos
aakalan
Enthusiast
Enthusiast

I think it's a firewall issue. check this. additional note: check the switch and turn off anti d-dos.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&e...

Reply
0 Kudos