VMware Cloud Community
MartinBergeron
Contributor
Contributor

Problem with ESXi 5 or Cisco servers (hosts disconnected)

Hi, since we change our servers for our ESXi 5 cluster, we have sometime host deconnection.

Servers specs is : 3x Cisco UCS C200 M2 (2x Xeon X5675 @ 3.06GHz and 96Gb memory)

Some times, without warnings, the host disconnect in vCenter.

We can still ping the host but cannot SSH or log with vSphere Client directly. If I want to log locally, as soon as I type the password and ENTER, it freeze and the only fix is hard restart. (and lost local logs)

While the host is disconnected, guests are disconnected but fully operationnal (remote desktop, ping). We can't vMotion or change settings on the guest since they are (disconnected)

Is anyone already saw that problem, it's really annoying.

Let me know if you need more informations.

Thanks and have a nice day,

Martin Bergeron

Reply
0 Kudos
11 Replies
jimraina
Enthusiast
Enthusiast

Hi

have you did some basic troublshooting like services are port number .and also dont forget that service blocked by any virus or firewall.

____________________
Always desire to learn something useful.
http://imagicon.info/cat/5-59/vbsmile.png

Reply
0 Kudos
marcelo_soares
Champion
Champion

You may be facing some local disk problems. Add a vMA to your environment and use it to collect the logs from the ESX servers: http://kb.vmware.com/kb/1024122

Also I am quite sure there is a syslog server included on vCenter installation. Use it if vMA does not work.

I had a very simmilar issue where the KB http://kb.vmware.com/kb/1030265 resolved the issue. You can try on a test ESX to check if this makes the problem stop.

Marcelo Soares
MartinBergeron
Contributor
Contributor

Thanks,

It really seems to be a HDD controller or HBA so I tried that solution http://kb.vmware.com/kb/1030265 but need to wait some days to make sure that servers don't go down again.

I'll let you know in few days.

Thanks again!

Reply
0 Kudos
jsuarep
Contributor
Contributor

Hello Martin,
We have the same issue and I cannot resolve it.
We have review http://kb.vmware.com/kb/1030265 but we can´t find the line ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40 in the log.
If you have resolved, please help with the solution.
Thanks!
Reply
0 Kudos
MartinBergeron
Contributor
Contributor

9 days and still up... Smiley Happy

Reply
0 Kudos
Rauldelaflor
Contributor
Contributor

Hi Martin,

we have the same issue. After reading your post we applied the solution of disabling the interrupt mapping.

The problem is that we don't see the ALERT message on the logs.

Is your enviroment still OK?

Did you find the ALERT message on logs?

Thanks and Best Regards,

Raul de la Flor

Reply
0 Kudos
MartinBergeron
Contributor
Contributor

Hi Raul,

I’m still up since 12 days from now and I didn’t see the ALERT because when the problem occurred and host has been cold reboot, every logs was lost.

But everything seems to be ok now.

I'll update thread in few days to let you know.

Reply
0 Kudos
MartinBergeron
Contributor
Contributor

Disable interrupt mapping on the host did the trick for me.

23 days now without disconnecting :winking_face:

Reply
0 Kudos
Rauldelaflor
Contributor
Contributor

Thanks a lot!!  15 days with no disconnections!!!

Reply
0 Kudos
marcelo_soares
Champion
Champion

Good to hear.... bad to Cisco/VMware, they really need to address this thing up...

Marcelo Soares
Reply
0 Kudos
CBorg64
Contributor
Contributor

ER 89872  Cisco Bios issue found.  From  Cisco:  "I have an  update on this. Today we were able to connect the dots and found that we  recently fixed a problem related to interrupt remapping in the VT-d.  Apparently, the work around of disabling interrupt remapping may not  always solve the problem.

In the past we have seen many different  manifestations of this issue, adapter disconnects and sometimes even PSODs.

Contact Cisco to obtain necessary Bios Updates.

Reply
0 Kudos