mdfields
Contributor
Contributor

Host Connection State - All ESX servers not responding

I received alerts from each of my esx servers stating the Host Connection State was not responding. I received that alerts all at the same time. It doesnt appear any of the VM's went down or even the esx servers for that matter.

Does anyone know what would cause this?

I am running ESX 3.5 and VC 2.5

Thanks

0 Kudos
19 Replies
weinstein5
Immortal
Immortal

Problems with your VC Networking - your VC server has simultaneously lost communication to all your ESX hosts - so I would take a look at the virtual center networking since it is the comon failure point that would cause all host to stop responding -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Draconis
Enthusiast
Enthusiast

Could you be experiencing DNS issues also? New recent changes? Just another thing to check. Probably isnt that though.

If you have found my answer helpful or correct, please consider awarding points.
0 Kudos
Wimo
Hot Shot
Hot Shot

We had that problem recently after receiving a new (and apparently bad) license file.

Side question - how were those alerts generated?

0 Kudos
Draconis
Enthusiast
Enthusiast

Strange. I didnt know a corrupt license file could generate false errors. There is a heartbeat between the VC and all of the ESX servers it is managing. Maybe the license itself prevented VC from talking to its servers or required a rescan of the file. That is only a guess and have not been tested by myself. Maybe David can comment more on this. You can attempt to repair the bad license file here if you havent done so already (). I believe that only supports server-based licenses.

If you have found my answer helpful or correct, please consider awarding points.
0 Kudos
Wimo
Hot Shot
Hot Shot

I checked the license - it said I had 500 of everything. Ran it through the license checker, same. Lmtools utility said the same. Sent it to our TAM, he said it was fine. It is now in the hands of Support.

In VC, on Licenses tab, it would show only ESX Server Standard, and Consolidated Backup. No Virtual Center, HA, DRS, Vmotion, etc. At first it was fine, but then my boss tried to add a host and that's when he got a "not enough licenses" error and everything disconnected. We put the old file back for now.

0 Kudos
kjb007
Immortal
Immortal

Without virtual center, the advanced features are moot. Make sure your license files includes a PROD_VC line item, otherwise, you can't run virtual center. If you are running 3.5, to get up and running, you can run in evaluation mode until you can get your license file corrected to include a vc license.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Draconis
Enthusiast
Enthusiast

Did you install the License server on the VC server or did you have another FlexNet license server somewhere on your network? Have you checked the License server? Maybe it isnt really the license file but the License server trying to interpret it. Just a thought. Would you doublecheck the ports are opened between your VC and license server (TCP 27000 and 27010 are the default). In any case, I am sure the VMWare support engineers will be able to narrow it down.

If you have found my answer helpful or correct, please consider awarding points.
0 Kudos
skyenter
Contributor
Contributor

Why don'y you connect to ESX host thru VI Client and navigate the license feature tab? If the license server configuration is gone on the tab, try to re-add your license server. We may get some better idea from there.

0 Kudos
Draconis
Enthusiast
Enthusiast

Kanuj has a good idea. VMWare support will be able to verify themselves but you can see what the license file is for if you look around and see if something is amiss. Page 56 on the Installation Guide () will give you a general idea. Make sure you dont save anything when looking around. :smileygrin:

If you have found my answer helpful or correct, please consider awarding points.
0 Kudos
Wimo
Hot Shot
Hot Shot

At this point I'm going to let support figure it out. Meanwhile, we're not helping the poor guy who started this thread.... Unless of course, his problem really is the same as mine.

0 Kudos
skyenter
Contributor
Contributor

http://www.vmware.com/checklicense/

This tool will take a pasted license file and parse, reformat, and attempt to repair it. It will also give you statistics regarding the total number of licenses found and highlight inconsistencies that could potentially cause issues.

0 Kudos
mdfields
Contributor
Contributor

Is it the kernal network that had the issues or the VM network? So if the VC server and the ESX servers lose communication with each other, this error will come up? I have restarted the VC server several times and have never received this error.

Also, is it better to have the VC server and the ESX kernal networks on the same subnet and physical VLAN?

0 Kudos
kjb007
Immortal
Immortal

I use a management network and keep my vc and my esx hosts on this network. And keep this network isolated from my vm/iscsi/vmotion network. When your host shows not responding, you may still have network connectivity, but it means that the vpxa agent on the esx host can not communicate back to vc over 902. Restarting the service, and/or clicking disconnect/reconnect on the vc side can help fix most agent issues.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
weinstein5
Immortal
Immortal

The connection between VC and your ESX Hosts is through the service console - so taht is the network I would takea look at - yes that is the best option to have vc and the service console network on the sames subnet and isolated -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
mdfields
Contributor
Contributor

I would assume that all three of the service console connections on the ESX servers did not experience issues at exactly the same time. Would it be safe to say that it was probably an issues with the VC server then? The VC server is physical, so I am assuming it could be a physical problem with the NIC or switch port.

Does that make sense at all?

0 Kudos
weinstein5
Immortal
Immortal

That makes sense -= that is what I was referring to when I said look at the network - check the networking on the VC server -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
skyenter
Contributor
Contributor

Could be. We may look at CRC error on the switch ports including duplex settings. Also, you can simply ping to three ESX servers from your VC server and dump the result to file and review any dropped packets or big TTL response. ex) ping -t ESXserver1 >c:\ESXserver1.txt

0 Kudos
azn2kew
Champion
Champion

Should check the following in details.

1. Confirm if the .lic file is valid and carries enough licenses

2. Switch to Evaluation mode to see if VC is working and ESX host connecting? If so, than its license file issue. If not continue

3. Double check your networking (NICs are pingable and gets to ESX hosts?)

4. Check firewalls are port 902 is allow or telnetable?

5. Check your service console configurations and DNS & Routing in details

6. Check your VPX Agents and services make sure restart them

7. Completely reboot ESX host and try to reconnect.

8. P2V your VC server and test it from VM perspective!

9. Worst, backup your database and rebuild new server in 20 minutes max.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA
0 Kudos
atzi
Enthusiast
Enthusiast

Hi,

a view days ago I have had a similar problem.

All of our hosts sporadically disconnects from VC. I opened a SR. The support have checked our logs. They said that our hosts service console are swapping and we should increase the service console memory to 380 MB. This have fixed our problem.

Hope this helps.

0 Kudos