VMware Cloud Community
Jacob0B
Contributor
Contributor

nic link down but nic shows status lights (esxi)

Hello all, I have a bit of a weird problem and I'm not having much luck with the search option.

Let me describe how I got to the point I am at with this:

We have about 29 VM's running on our vmware esxi server, all of them are more or less for internal testing purposes. All of these vm's use 2 network ports on our vmware device, but are seperated using virtual switches and vlans. One of the network port connects to a dumb switch and another connects to a cisco switch with trunking on the interface.

I created a brand new vm and a new vswitch to test bridging mode in a product we are using. I modified an older vm to connect to this new vm switch, and then the new device to connect to this new switch, our public vlan, and our internal network. The idea was that the bridge would allow the older vm to continiue to talk to the public vlan as normal, but the bridged device would work transparently in the middle.

Once I finished this configuration and started the bridge device, the network of our entire building ceased to function, including my connection to the vmware server. Once I realized that this had happened, I walked over to the physical piece of hardware and unplugged it entirely from the network. The rest of the building then went back to its happy network-purring self.

Unforunately, I have usb passthrough turned on, and was unable to connect to the terminal on the esxi server. Instead, I simply pushed the power button and let the server shut down.

Once it came back up, the nic that was previously attached to the trunking port of the cisco (vmnic0) had/has stopped working. Once I managed to get into it on the other interface, I was able to issue a "esxconfig-nic -l" which shows the device as down.

On the back of the server, I can see the vmnic0 has a yellow light and a green light. The functioning vmnic1 has two green lights. I switched network cables on the two interfaces once to see if anything would change, it did not.

Does anyone have any ideas?

Thanks,

-Jacob

Tags (3)
0 Kudos
26 Replies
rickardnobel
Champion
Champion

If you have already rebooted the host then this will most likely not help now. Something that might be possible would be to shutdown the host, disable the NIC in bios, reboot and let ESXi see this, then reboot again and re-enable it in BIOS, just to see if it "becomes visible" again. Not entire likely, but running a bit out of options here. Smiley Happy

My VMware blog: www.rickardnobel.se
Jacob0B
Contributor
Contributor

Thats a very interesting idea. Seems like a bit of a long shot, but like you pointed out, we don't exactly have a lot of options.

I ended up not getting a chance to reboot the server after hours the other day, it was a bit of a busy week for us. I'm doing some other maintenance on that server after hours on thursday anyway, so i'll have lots of time to fiddle with it.

I'll be sure and let you know what happens.

-Jacob

0 Kudos
gctn
Contributor
Contributor

Hello guys,

we are experiencing exactly the same issue on two servers.

ESXi 5.0, but we have the same with ESX 4.1

Server: HP DL 380 G6

nic with problem Broadcom NetXtreme II BCM 5709

disabling/enabling the nic didn't help.

We are going to open a ticket with VMware in the mainwhile...... any other idea?

Thanks

Giorgio

0 Kudos
Jacob0B
Contributor
Contributor

yesterday I had lots of time to mess with the server. We basically shut down for two weeks around the Christmas holiday, and most of the people who rely on that server took off a couple days early. Basically, that gave me yesterday and today to mess with it.

That server has had slow I/O for a long time, due simply because its only drives were 2 1TB drives in a mirrored RAID. Just 7200RPM SATA, nothing special. It also used to have only 8GB of RAM and a single quad core Xeon. We recently upgraded the RAM to 64GB and added a second Xeon. At the same time, we bought two new 1TB drives, but had to hold off because installing the drives involved a full backup of the server and a reinstall. (we only have four drive slots, and before there was a system drive + the 1tb raid array = 3 slots used. )

For the backup process I booted into a live Ubuntu flash disk, simply because I am more comfortable in a real linux environment than the kindof-linux of esxi. I used this opportunity to test the nic. It was still dead.

Once I reinitialized the new RAID10 array, and got esxi reinstalled, I double checked using the new esxi. again, the nic is still dead. I'm going to have to call it - either through coincidence or a weird overload condition, the botched bridging experiment I explained in my first post must have killed the nic.

gctn wrote:

ESXi 5.0, but we have the same with ESX 4.1

Server: HP DL 380 G6

nic with problem Broadcom NetXtreme II BCM 5709

disabling/enabling the nic didn't help.

While the symptoms sound similair, the nic I am describing is an "Intel Corporation 82574L Gigabit".

Ricknob has been very helpful in pointing out ways to check the nic. He may be able to help you.

Since you say you are having the same problem, I am assuming that your switch is reporting that the nic is active, and that there are link lights on the physical hardware, but that esxi is not detecting the the link. If so, then I would definitely say you must have found a bug. For me, however, I am fairly certain I have managed to overload the nic in some way. The only way this could still be esxi's mistake is if the generic e1000 driver has somehow misconfigured a firmware-level setting for my nic, which I find unlikely.

Good luck on getting your issue resolved, however.

-Jacob

0 Kudos
gctn
Contributor
Contributor

Thanks for your answer Jacob, I confirm the problem is the same, even if a different nic, and unfortunately ti has not been solved yet.

However I have opened a ticket with vmware and HP also, and making dozens of tests and experiment but for the time being no solution at the problem.

Thanks

Giorgio

0 Kudos
vicenac
Contributor
Contributor

I bought two Dell R730xd with Qlogic nics, 10GB and 1 GB combo.

They came with ESXi 6.0.0 build number starts with 28.... I have to go to build 302... which is ESXi 6.0.0 U1.

WMware posted a network isolation issue and recommended 6.0.0 U1A build 307...

I upgraded one server with the offline bundle and the other fresh install.

Now both have the "down" nics (especially the 1 GB ones), while the activity lights are up on the back of the nic.

These servers had only the management nic configured and nothing else.

I tried different versions of drivers for the nics, swapped cables around... no luck.

I have also reverted to U1, but no change at all.

0 Kudos
markT94
Contributor
Contributor

Ran in to this same issue.  The server has 2 onboard NICs (0 and 1), plus 2 quad-NIC cards, stacked horizontally.  My confusion was thinking that NICs 3-5 were the bottom board, and 6-9 the  top (number label 6-9 runs down the center, between the two cards, no other numbers visible).  My problem was that I was plugged in to NIC 9, but my vSwitch was configured to use NIC 5.  Changed to NIC9 at the vSwitch - all good.

0 Kudos