VMware Cloud Community
serenity2011101
Contributor
Contributor

ESXi 4.1.0 keeps disconnecting from vSphere server

I recently added a 2nd esxi host to my network.  I had an ESXi 4.0 host and vsphere server in place already.  The host I added was ESXi 4.1.0.  Every minute or two, it disconnects from vsphere server.  If i connect directly to the host, all is fine and I don't have any issues.  I used the IP address to add the host to the vsphere server, so I'm pretty sure it's not DNS related.  This is the first real problem I've had so my vmware troubleshooting skills aren't exactly the greatest. Any help is greatly appreciated.

Thanks,

Mike

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
21 Replies
Troy_Clavell
Immortal
Immortal

  I used the IP address to add the host to the  vsphere server, so I'm pretty sure it's not DNS related

I would recommend adding to vCenter using FQDN and see what happens.

0 Kudos
serenity2011101
Contributor
Contributor

Also, I have already tried restarting the management agent, doesn't seem to help.

Thanks

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
serenity2011101
Contributor
Contributor

I will certainly try that, but since my 4.0 host is added via IP, I don't understand why the 4.1 host wouldn't work as well.

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
Troy_Clavell
Immortal
Immortal

you are using vCenter 4.1, correct?. It is required if managaging a 4.1 Host.

0 Kudos
serenity2011101
Contributor
Contributor

Yes, I am using vcenter 4.1.  Thanks

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
serenity2011101
Contributor
Contributor

Here are the last log entries right after a disconnect.

[2011-02-04 03:02:55.501 02344 info 'App' opID=HB-host-86@70729] [VpxLRO] -- FINISH task-internal-121270 -- host-86 -- VpxdInvtHostSyncHostLRO.Synchronize --
[2011-02-04 03:03:05.682 02988 error 'App'] [HttpUtil::ExecuteRequest] Error in sending request - The requested name is valid, but no data of the requested type was found.
[2011-02-04 03:03:05.682 02988 error 'App'] [ServerAccess] Exception while invoking remote login: vim.fault.HttpFault
A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
Troy_Clavell
Immortal
Immortal

it is very important that you have proper name resolution within your enviornment.  All ESXi Hosts must be able to ping/resolve each other and vCenter by using FQDN.  This is also the case for vCenter to ESXi

0 Kudos
piaroa
Expert
Expert

Have you tried restarting the host's mgmt services with "services.sh restart". If not, you could try removing the host from vCenter (your VMs will keep running), restarting vCenter server service and adding the host back.

I saw this happen once, and I solved it with the above steps.

If this post has been helpful/solved your issue, please mark the thread and award points as you see fit. Thanks!
0 Kudos
bulletprooffool
Champion
Champion

First thing you should always do when you can’t connect to – or if  there is an issue with your connection between vCenter and ESXi \ ESX is  to:

1) check DNS configuration on the ESXi server and your DNS server  that ESX points to making sure you have the appropriate entries
2) Check host files etc in /etc/hosts, /etc/resolve.conf,  /etc/sysconfig/network and /etc/vmware/esx.conf files
2) try to disconnect and reconnect your ESXi host from your vCenter  inventory, this uninstalls and reinstalls the vCenter agent using FQDN  and then with IP address if FQDN didn’t work
3) Try Restarting both the vCenter management agent on the ESX host and  the ESX host management agent. Learn how to do this here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100349...
4) If the above didn’t do anything for you, it could be lost  connectivity to a LUN, which can cause problems with ESX (less now than  earlier versions ESX 2.x), connect to ESXi host directly with VI Client  and perform a rescan of your storage adaptors and LUNs.

One day I will virtualise myself . . .
0 Kudos
serenity2011101
Contributor
Contributor

I updated DNS and made sure that all hosts couple ping all other hosts (including vcenter server).  The timeout happened again about a minute later.  I'm rebooting everything to go ahead and get that out of the way.

thanks

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
bulletprooffool
Champion
Champion

Did you check all hosts files etc?

Also, are you using AD integrated DNS - have you lost DNS records?

Have you tried connecting via IP?

Have you verified that the network is not dropping?

anything in the VC logs?

One day I will virtualise myself . . .
0 Kudos
serenity2011101
Contributor
Contributor

I've check all DNS settings

I've rebooted everything

Verified there are no network problems (constant pings going from everything, to everything)

I've removed and re-added the problem host to vcenter, using fqdn

I've restarted the management agent

Still having the same problem...

Thanks for the help

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
Troy_Clavell
Immortal
Immortal

then you may start looking at event logs in vCenter, it could be a DB issue.  Ensure your transaction logs aren't full, and there are no other space isues.

Also, check the ESXi Hosts for space issues

0 Kudos
serenity2011101
Contributor
Contributor

There is 1 vm on the new host.  There is about 30TB free, don't think there are space issues.  Here are the last few entries in the vcenter log, right after a disconnect.  What does this mean?

[2011-02-04 09:37:18.002 02412 error 'App'] [VpxdHealthServiceMonitor::LoginToHealthService] Unable to login into HealthService. Error is vmodl.fault.HostCommunication
[2011-02-04 09:37:18.002 02412 error 'App'] [ServerAccess] Exception while invoking remote login: vim.fault.HttpFault
[2011-02-04 09:37:18.002 02412 error 'App'] [HttpUtil::ExecuteRequest] Error in sending request - No such host is known.
[2011-02-04 09:37:17.121 01772 info 'App' opID=3F8F59BA-00000130] [VpxLRO] -- FINISH task-internal-297 --  -- vim.AuthorizationManager.retrieveRolePermissions -- EE8683C9-ABDB-4DB8-8C0A-12DEA8EFAF79(C07C38A9-1241-49D8-A619-CFC9E1011312)
[2011-02-04 09:37:17.120 03016 info 'App' opID=3F8F59BA-00000131] [VpxLRO] -- FINISH task-internal-296 --  -- vim.AuthorizationManager.retrieveRolePermissions -- EE8683C9-ABDB-4DB8-8C0A-12DEA8EFAF79(C07C38A9-1241-49D8-A619-CFC9E1011312)
[2011-02-04 09:37:17.120 01772 info 'App' opID=3F8F59BA-00000130] [VpxLRO] -- BEGIN task-internal-297 --  -- vim.AuthorizationManager.retrieveRolePermissions -- EE8683C9-ABDB-4DB8-8C0A-12DEA8EFAF79(C07C38A9-1241-49D8-A619-CFC9E1011312)

Thanks for the help guys,....

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
Troy_Clavell
Immortal
Immortal

see if the below articl helps, based on the error

[2011-02-04 09:37:18.002 02412 error 'App']  [VpxdHealthServiceMonitor::LoginToHealthService] Unable to login into  HealthService. Error is vmodl.fault.HostCommunication

http://kb.vmware.com/kb/1019082

0 Kudos
serenity2011101
Contributor
Contributor

Still cannot figure this out.  Absolutely no problems with functionality of th hose, and the vsphere client never disconnects when connected directly.  Any more ideas?

A+, Network+, Security+, CEH, ECSA, CHFI, LPT, Sun Solaris Certified Network Administrator, (Expired) VCP-4.0 (hope to have it again when money grows on trees and I can afford an extortion class) =^)
0 Kudos
willjasen
Contributor
Contributor

Check Windows firewall settings.  I just recently reloaded our vSphere server and I kept getting disconnected.  Thought it might be the management agents on the ESXi hosts but restarting those never worked.  After remembering that the old vSphere server had a lot of firewall exceptions and the new one did not, I disabled Windows Firewall and sure enough, vSphere reconnected to the ESXi hosts.  Even though the vSphere exceptions were listed under Windows Firewall under this new vSphere server, it still kept disconnecting (see the attached screenshot).  Now to determine what ports and services are exactly needed to be open.

0 Kudos
willjasen
Contributor
Contributor

Adding a custom UDP port exception of 902 and turning Windows Firewall back on seems to correct the issue.  See attached screenshot.

0 Kudos
gregplou
Contributor
Contributor

Adding the UDP Port 902 exception also worked in our environment where we were experiencing this issue. Thanks....

Best Regards, Greg
0 Kudos