VMware Cloud Community
COS
Expert
Expert

ESX 3.5 Virtual Infrastructure Manager - Losses connection to one node

I have a 2 node ESX 3.5 cluster. For some strange reason it keeps losing connection to the second node. I cycle the "VMware VirtualCenter Server" service and it comes back. But then later it loses connection again. So I rebooted the server but the same thing happens again later.

Any ideas what to check?

The network is fine because both nodes and the VIM server can talk (Ping) to each other fine. Also each ESX node has all the dns names and IP in the local hosts file. They have been setup that way since day 1. Now all of the sudden it keeps losing connection to the second node and shows "(not responding)" and the running VM's from that node show "(disconnected)".

Any ideas?

Thanks

Reply
0 Kudos
6 Replies
eeg3
Commander
Commander

If you test the connection, do you see any packet loss? Could be something as simple as a bad/loose cable.






____________

blog.eeg3.net | Useful VMware-related Links

If you found this or any other post helpful, please consider the use of the Helpful/Correct buttons to award points.

Blog: http://blog.eeg3.net
Reply
0 Kudos
marcelo_soares
Champion
Champion

Check basic network as eeg3 said. Also, when you have the host on disconnect state, enter on it using SSH or physically on the console, and type the command "uptime". You'll see an output like:

20:11:29 up 3:59, 3 users, load average: 0.10, 0.19, 0.23

If the first numer of load average is above 1.00, you may have some SAN problem on this host. Check also /var/log/vmkernel log. You can try anlyzing the /var/log/vmware/hostd.log because it will have the exact explanation, but is not so easy to find it.

Feel free to paste any log entries you think useful for troubleshooting this.

Marcelo Soares

VMWare Certified Professional 310/410

Virtualization Tech Master

Globant Argentina

Consider awarding points for "helpful" and/or "correct" answers.

Marcelo Soares
Reply
0 Kudos
COS
Expert
Expert

Unfortunately my best test is ping for today. The network folks will be in Monday. As far as I see the ping replies are solid with no change in latency.

Reply
0 Kudos
COS
Expert
Expert

output of uptime gets me this...

16:25:03 up 256 days, 7:12, 1 user, load average: 0.04, 0.01, 0.00

I'll check the /var/log/vmware/hostd.log.

Any other logs I can sift through?

Thanks

Reply
0 Kudos
marcelo_soares
Champion
Champion

This uptime is from a moment where the host is disconnected?

Marcelo Soares

VMWare Certified Professional 310/410

Virtualization Tech Master

Globant Argentina

Consider awarding points for "helpful" and/or "correct" answers.

Marcelo Soares
Reply
0 Kudos
COS
Expert
Expert

You are correct. I ran the command while it showed "(disconnected)" state. I disabled DRS and HA just in case. I don't want the "Split Brain" scenario to occur.

Reply
0 Kudos