VMware Cloud Community
RobBuxton
Enthusiast
Enthusiast

Servers lost NIC Connection

Hi All,

We've just experienced an odd issue with several vmware guest servers.

4 servers all lost network connectivity at the same time.

For the guest servers we have 3 NIC teamed for a single vswitch which is then divided into 3 VLAN Tagged groups. Only one VLAN is in use at the moment, we have recently implemented VLAN Tagging.

A large file copy was being done to one of the servers on this vswitch.

As noted, 4 servers then dropped off, they were still accessible via the console but could not communicate with the outside world.

Migrating the servers to another host resolved the issue.

Other guest servers on the same vswitch were not affected. All vmnics are in use so it's not a lost NIC/vmnic. One of the three vmnics now has only very low usage servers on it. My guess is that there were a number of busy guests on this vmnic and these all got disconnected. The low usage guests were not affected.

I cannot find anything that suggests VLAN Tagging should by affected by a high load on a vmnic / NIC that would have been caused by a large file copy.

Any issues I should be aware of?

The other thing we've recently done is apply the latest ESX Patches, and now vmware tools is out of date for the majority of our guests, but not for one of the servers that lost its NIC. So that isn't an issue.

0 Kudos
3 Replies
jasonboche
Immortal
Immortal

Yow. Sounds like a hiccup of kernel proportions. I fairly certain the outdated VMware Tools has little or nothing to do with this situation. Load may have triggered it, but this should be classified as a bug and not a repeat expectation. After all, we are dealing with the enterprise class of virtualization products VMware ESX Server.

Did you bounce the host that had the issue?

Has it had any trouble since?

Are there any VMs still runnig on this host?

Do the three trunks go into the same switch or do they feed into separate switches (I hope the latter)?

Do you have beaconing enabled in the network configuration? (this would be my best guess based on the information so far, otherwise kernel garf)

I'd get a support call opened with VMware.

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos
RobBuxton
Enthusiast
Enthusiast

No, I haven't bounced the Host, there were a load of guests unaffected. I think it may have been confined to just one of the three vmnics in the team.

No trouble since, but early days...

As above, yes a number of VMs and these all had no issue, about 4 out of 12 had an issue.

The team is split over two switches.

For Beaconing, if you mean the option under Network Properties for either the vswitch or vlan group then no, that's all link Status.

Yep, I will log a call, just trying to see if there are any known issues that I don't know about!

0 Kudos
jasonboche
Immortal
Immortal

Yes, beacon probing is what I was referring to.

Based on my experience of VI3 gremlins, if they happened once, they will happen again, and again, and again, usually at the strategically worst time ever. Don't count on this being just a one time fluke.

Let us know if you find anything conclusive.

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos