We have 2 ESXi 4.1 hosts that randomly (about once a month) completely drop off the network. All guests are inaccessible and the management network is inaccessible. You cannot ping any IP address or log into anything. When you access the console, everything looks a-ok. Using the console tools, pinging out onto the network fails as well. After restarting, everything is fine until the next failure. Both boxes are failing the same way.
Both are HP Proliant DL360p Gen8 servers.
I'm having some difficulty troubleshooting this as well as VMWare's documentation points to log files that I can't on my system for troubleshooting. I have exported the log files from the vSphere client and everything seems to be from after the last restart, so it's not much help.
Any assistance with diagnosing (or if you already know about this problem) would be greatly appreciated.
I just found the KB article about this issue.
VMware KB: Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere
André
IIRC there were some issues with network adapter drivers. Which type of NICs do you have in the hosts?
André
We're using the stock Broadcom network adapters. Do you know how I can identify the driver version I'm currently using? HP's most recent driver update is from 5/7/2012.
You may want to check with the VMware Compatibility Guide whether version 3.129d.v40.1 (2013-03-04) is supported with the ESXi build you currently run.
André
That looks like the right one. I'll give it a try over the weekend.
I just found the KB article about this issue.
VMware KB: Broadcom 5719/5720 NICs using tg3 driver become unresponsive and stop traffic in vSphere
André
Great. That makes me feel even better. Thanks for digging that out!
This helped me as well, thank you.
