Tr0llk1ng
Contributor
Contributor

Failed to bind heartbeat socket - ESX 3.5 | VC 2.5

Hi,

i've a little Problem since the upgrade of my ESX 3.0 Farm to 3.5 together withVirtual Center Upgrade to 2.5.

I get the following errors in the vpxa.log of all of my 5 ESX Servers:

<snip>

Failed to bind heartbeat socket. Using any IP.

heartbeating 10.67.8.13 ...Count 17093

Failed to bind heartbeat socket. Using any IP.

heartbeating 10.67.8.13 ...Count 17094

</snip>

I can see all ESX servers in Virtual Center and everything works fine but cloning of Virtual Machines times out due to "Error Connecting to Server" (seems the Virtual Center checks if the destination esx is alive) but there are no other restrictions, i can vmotion, start / stop, suspend all the things with my vms.

I've already checked:

  • Status of Port 902 UDP on the VirtualCenter server. Process is listening (netstat -noa, match of pid to process VPXD.exe) Restart of server does not change the behaviour

  • Status of Port 902 on the ESX Server Firewall, everything seems fine as the port is opened

  • network connectivity between ESX / Virtual Center - basically no firewall in between, just one vlan to another (10.67.8.13 is the correct IP of my Virtual Center Server), can ping the VC from ESX and back

  • Restarted mgmt-vmware | hostd | vpxd whole ESX

  • Uninstalled and reinstalled VPXD Agent with Upgrade package from Virtual Center upgrade folder - didn't change the behaviour

If anybody can help me plz! I've working on this since a few hours and really need to clone some Templates.

Best Regards

Christoph

Tags (2)
0 Kudos
4 Replies
Rincey
Contributor
Contributor

We are having what sounds like the exactly the same problem.....

However our VC is geographically distant to the ESX cluster that we are trying to work on.

I can deploy templates in teh cluster that is in the same building as VC.

What works (in the first cluster):

vmotion, pinging, nslookup, etc etc

What doesn't:

Template deploying, Storage Migration (even cold),anything that involves moving/creating the .vmdk files (thru VC) (eg I've seen I have a single VM on local storage, so I tried to move it to my SAN - teh VM is down at the moment)

I'm about to try cp'ing the local VM files onto teh SAN via a SSH session and manually editting the .vmx file.

Very frustrating.

Oh yeah - cloning worked. We had a blank W2k3 virtual machine - so essentially a template, and we cloned it while it was down...... but can't use a "deploy from template"......!

0 Kudos
Adam_Ladd
Enthusiast
Enthusiast

We just had the same problem with a host we changed the Service Console network addresses on, and found this thread while googling. Then I found KB Article: 1005892, and followed the instructions, and it fixed it for us!

One slight difference, after removing the host from VC, the vxpa agent was automatically removed, so we didnt have to do the rpm -e command.

We are on esx 3.5.0.110268 and VC 2.5.0.104215

0 Kudos
horace_ng
Contributor
Contributor

One of my ESX host is sitting behind the firewall, and I'm seeing this error.

Obviously inside vpxa.cfg, the HostIp is configured with the firewall public IP address. And the agent couldn't bind the interface with that IP since it doesn't exist at all.

I search around google and got nothing. Anyone know what I can do?

It only affect clone/migrate/vmotion and nothing else though.

0 Kudos
horace_ng
Contributor
Contributor

Finally, I've been able to manage my ESX Host behind NAT by adding ANOTHER vmkernel with the external IP address (netmask to be 255.255.255.255). It will then listen to the vcenter traffic and take actions accordingly.

Thanks for vmware support to resolve this issue. But I hope this thing will get more intuitive in the future patch. :smileygrin:

0 Kudos