Nics on multiple ESXi 4.1 Update 2 hosts having intermittent problems with auto-negotiation, going to 100Mb/Full. So far it's been happening on integrated NC-Series/Broadcom 382i quad nic ports, which use the bnx2 driver. Tried many troubleshooting steps so far, including:
Setting auto<>auto
Checked switch ports for errors (none). Made sure it was set to auto, portfast enabled.
Making sure nic firmware is up-to-date
Applying latest Broadcom driver updates
Changed patch cables
Tried hard coding both ends (Did not see the issue then, but then again I only rebooted 10 times and the issue is very intermittent)
Not sure at this point what the issue is.....
We have other clusters (that happen to be ESX/ESXi 4.1 Update 1) that have never had this issue, but there are mutliple variables here.
Auto-negotiation in general can be unreliable, depending on a number of different factors. Do you have the option to hard-code this on the switch side?
We do have the option, but we would prefer auto<> since it is the recommended config option per VMware and keeps things standardized. Also, we use auto on everything else (VMware, non-VMware, and we've never had issues until now.
Doh! Sorry, I somehow missed that you'd already done that. I agree that it's not preferrable outside of troubleshooting. Have you tried moving it to a different switch or switchport?
We are seeing the issue on at least two hosts in the cluster so far, multiple switch ports. All of the nic ports that have had the issue happen to be on the integrated quad port and also happen to be plugged into the same switch module. However, the network guys ran diagnostics, check for errors, and don't see any issues with the module/ports. Here are a couple of additional tests I'm performing at the moment: Connected one of the problem uplinks to a different switch module in the same chassis. Also, running cable to switch directly to make sure it is not an issue with the run from patch panel to switch. Trying to see if I can recreate issue again, which is always a challenge because it is intermittent.
I've tested the 3rd 4.1 U2 ESXi host in this cluster, which is connected to a different patch panel and different switch chassis/blade, and can confirm that the issue is happening on this host as well. Still have a case open with VMware and I'm following up....