Hi, since we change our servers for our ESXi 5 cluster, we have sometime host deconnection.
Servers specs is : 3x Cisco UCS C200 M2 (2x Xeon X5675 @ 3.06GHz and 96Gb memory)
Some times, without warnings, the host disconnect in vCenter.
We can still ping the host but cannot SSH or log with vSphere Client directly. If I want to log locally, as soon as I type the password and ENTER, it freeze and the only fix is hard restart. (and lost local logs)
While the host is disconnected, guests are disconnected but fully operationnal (remote desktop, ping). We can't vMotion or change settings on the guest since they are (disconnected)
Is anyone already saw that problem, it's really annoying.
Let me know if you need more informations.
Thanks and have a nice day,
Martin Bergeron
Hi
have you did some basic troublshooting like services are port number .and also dont forget that service blocked by any virus or firewall.
____________________
Always desire to learn something useful. ![]()
You may be facing some local disk problems. Add a vMA to your environment and use it to collect the logs from the ESX servers: http://kb.vmware.com/kb/1024122
Also I am quite sure there is a syslog server included on vCenter installation. Use it if vMA does not work.
I had a very simmilar issue where the KB http://kb.vmware.com/kb/1030265 resolved the issue. You can try on a test ESX to check if this makes the problem stop.
Thanks,
It really seems to be a HDD controller or HBA so I tried that solution http://kb.vmware.com/kb/1030265 but need to wait some days to make sure that servers don't go down again.
I'll let you know in few days.
Thanks again!
9 days and still up... ![]()
Hi Martin,
we have the same issue. After reading your post we applied the solution of disabling the interrupt mapping.
The problem is that we don't see the ALERT message on the logs.
Is your enviroment still OK?
Did you find the ALERT message on logs?
Thanks and Best Regards,
Raul de la Flor
Hi Raul,
I’m still up since 12 days from now and I didn’t see the ALERT because when the problem occurred and host has been cold reboot, every logs was lost.
But everything seems to be ok now.
I'll update thread in few days to let you know.
Disable interrupt mapping on the host did the trick for me.
23 days now without disconnecting ![]()
Thanks a lot!! 15 days with no disconnections!!!
Good to hear.... bad to Cisco/VMware, they really need to address this thing up...
ER 89872 Cisco Bios issue found. From Cisco: "I have an update on this. Today we were able to connect the dots and found that we recently fixed a problem related to interrupt remapping in the VT-d. Apparently, the work around of disabling interrupt remapping may not always solve the problem.
In the past we have seen many different manifestations of this issue, adapter disconnects and sometimes even PSODs.
Contact Cisco to obtain necessary Bios Updates.
