VMware Cloud Community
vmkarthik
Enthusiast
Enthusiast

ESX host gets disconnected frequently

Hi All,

I have vSphere environment with two host and vcenter setup

Everything was fine until this monring my ESX host get disconnected so frequently....

After which I have to connecte it everytime...its so frequent that it happens every 5 min....

looking for help.

Thanks

0 Kudos
15 Replies
AndreTheGiant
Immortal
Immortal

Give more info:

  • ESX are 4.0 with latest patch?

  • how many NICs do you have?

  • during disconnections can you ping both IP and FQDN of your ESX?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
vmkarthik
Enthusiast
Enthusiast

no patches are applied...

It has only one physical nic attached to it and yes am well able to ping using the ip and FQDN.....it just disconnects from the vcenter..

0 Kudos
NTurnbull
Expert
Expert

What is your DNS configuration - can you resolve shortname/FQDN both ways? Whats in your host file on the ESX box? How are you connecting the host in VC - via hostname or IP? Have you got and errors in the ESX hosts log files (vmkernel etc..) and what are the errors in VC? Also if you try a ping from both VC and ESX does either of them drop packets when the host gets disconnected?

Oh and the obvious question, has anything changed since yesterday?

Thanks,

Neil

Thanks, Neil
0 Kudos
bulletprooffool
Champion
Champion

We had this same issue - in our case, we were using VLans to segregate traffic, but we had 2 Port groups on the same Vlan causing loops.

One day I will virtualise myself . . .
0 Kudos
schepp
Leadership
Leadership

Had the same problem a few days ago. I fixed it by restarting the management service on the esx host: /etc/init.d/mgmt-vmware restart

Greets

0 Kudos
vmkarthik
Enthusiast
Enthusiast

I have tried restarting those services but that doesnt seems to fix

0 Kudos
bluedrake
Contributor
Contributor

Can you run a constant ping to the esx server from the vcentre machine? Have you tried to remove the esx server from vcentre and then re-add it?

0 Kudos
LiamCurtis
Enthusiast
Enthusiast

If you SSH into the esx host and su to root, try doing more /var/log/vmkwarning and see what you find in there. I was having similar issues with a host and it turned out it was because I yanked a lun away from it in my SAN....

0 Kudos
vmkarthik
Enthusiast
Enthusiast

hi,

This seems prettey close to my issue.

as I was having a iscsi using starwind software which I changed the IP by mistake and changed that again.

and now after changing the IP back to previous one am not able to see the iscsi san from host or from vcenter.

Exactly after this the problem started....

0 Kudos
Dallas74
Contributor
Contributor

0 Kudos
vmkarthik
Enthusiast
Enthusiast

I checked the article but I found my vcenter server IP there in the vpxa.cfg file....

inbetween Liamcurtis: what was done to resolve your isue? because mine seems to be same like yours started with iSCSI SAN.

If anyone is aware or have any idea....suggestions please.....

0 Kudos
vmkarthik
Enthusiast
Enthusiast

Hi LiamCurtis,

This is what I got after running the cmd that you said: more /var/log/vmkwarning

Aug 20 12:47:09 localhost vmkernel: TSC: 421516960 cpu0:0)WARNING: ACPI: 1172: MCFG>: Table length 62 > 60

Aug 20 12:47:10 localhost vmkernel: 0:00:00:00.009 cpu0:4096)WARNING: NetCoalesce: 3099: invalid NET_COALESCE_HDLR_PCPU 1; numPCPUs 1; use PCPU 0.

Aug 20 12:47:12 localhost vmkernel: 0:00:00:01.590 cpu0:4096)WARNING: NetCoalesce: 2883: invalid NET_COALESCE_HDLR_PCPU 0; numPCPUs 1; use PCPU 0.

Aug 20 17:56:46 localhost vmkernel: TSC: 421508856 cpu0:0)WARNING: ACPI: 1172: MCFG>: Table length 62 > 60

Aug 20 17:56:47 localhost vmkernel: 0:00:00:00.009 cpu0:4096)WARNING: NetCoalesce: 3099: invalid NET_COALESCE_HDLR_PCPU 1; numPCPUs 1; use PCPU 0.

Aug 20 17:56:49 localhost vmkernel: 0:00:00:01.592 cpu0:4096)WARNING: NetCoalesce: 2883: invalid NET_COALESCE_HDLR_PCPU 0; numPCPUs 1; use PCPU 0.

I couldnt infer anything from it.

Any idea??

0 Kudos
LiamCurtis
Enthusiast
Enthusiast

I got around mine with a simple reboot. I am also on a fiber channel san, so not much help there.

Not sure what those errors would mean. Curious to know what hardware you are running?

0 Kudos
EPL
Contributor
Contributor

I've actually got several SR's open with Vmware on a similar issue. Seems as though there's a bug in ESX 4 (imo). When the hosts lose connectivity with an iSCSI SAN, it will eventually bring the host down. I have one case open with the storage team, and another with the team that deals with the Virtual Center agents on the hosts. We've been able to duplicate (both by accident and by action) the problem, and in each case, the hosts eventually become disconnected, and our only solution is to do a hard reset. Killing the services and starting manually does not bring it back.

0 Kudos
vmkarthik
Enthusiast
Enthusiast

Hi

My H/w specs are,

ESX Host : 3.2 GHz P-IV proc,

3Gb RAM, 1NIC card

the other ESX host is also pretty same. This is a test environment so its a desktop harware for which I was using the iSCSi SAN for shared storage.

0 Kudos