VMware Cloud Community
sel57
Enthusiast
Enthusiast

Some VM's lose network connectivity during vMotion

I've had a long term problem part time problem with vSphere 5.X whereas the occasional vm will lose network connectivity during a vMotion. It's a very intermittent problem, and there is no easy way to differentiate because vm's can have the same os and network adapter on the same host, but one will migrate fine and the other will lose network connectivity. I've become so hyper aware of the issue that I start pinging production vm's before I vmotion so I'm aware the second they go offline. A simple network adapter toggle (off, then back on) will usually kick the adapter back on.

I'm running vCSA 6 and I'm in the process of upgrading my ESXi 5.1 hosts, so I have a mixed 5.1/6.0 environment. I have had this problem ever since I created this environment at version 5.0. The one thing I have not yet seen is losing network connectivity when migrating to an ESXi 6 host. I don't know if that means anything, or if it's just because I still have a majority of 5.1 hosts and just haven't seen it yet, but I still wanted to mention it. I also run standard switches and have ample ports on each.

Anyone else experience this and know what might be the cause? I may open a ticket with VMware for the heck of it, but though I'd see what the community thought first.

Tags (1)
0 Kudos
5 Replies
virtualg_uk
Leadership
Leadership

A few things to try here:

1 is VMware tools installed and running on the VMs?

2 Can you check the switches are configured correctly. I check that the config is correct on each switchport that ESXi hosts are connected to. When a VM is vMotioned to a new host, ESXi randomly selected a NIC on the vSwitch to pass traffic over for the VM. If this NIC is not configured correctly on the physical switch side (VLAN / speec/ duplex incorrect etc) then the VM would not necessarily have access to the network.

3 Can you determine if this issue occurs when using vMotion to move a VM to a particular host?


Graham | User Moderator | https://virtualg.uk
0 Kudos
sel57
Enthusiast
Enthusiast

Thanks for the reply grba. Much appreciated!

1) Not something I've checked everytime it's happened, but will keep an eye on it with future incidents. For Windows I'm very good at keeping them all updated, but linux not so much, and this problem seems to happen to both operating systems.

2) I don't manage our network switches. I depend on a third party for configuration and support. Since a network adapter toggle off/on usually fixes the issue, it's clear the hosts have access to these vlans, but I can't say for sure whether there are other misconfigurations because not only do I not have access, I don't have the knowledge. Are there any commands I could run from my (host) side to check the integrity of the network configuration?

3) I have never been able to identify a single host. It seems to happen intermittently across the entire environment (different vm's, different hosts, etc.)

0 Kudos
virtualg_uk
Leadership
Leadership

What you might find is that one host or multiple hosts have multiple NICs for VM traffic. If one of these is misconfigured on the physical switch then a VM gets randomly assigned a NIC which could be causing the problem. Reconnecting the VMs network could just be selecting another host NIC which solves the problem, I have seen this once before.

If you have CDP enabled, hover over each NIC on your vSwitches to see if the config is the same on all of them.

Let us know how you get on


Graham | User Moderator | https://virtualg.uk
0 Kudos
SilentASH
Contributor
Contributor

Hello sel57,

I am also having the same problem you describe. Have you had any luck getting this fixed?

0 Kudos
jatinjsk
Enthusiast
Enthusiast

Please check vmkernel.log file and find logs similar to this.

You can execute below command to check if dvPort has got disabled after the vMotion.

/var/log # cat vmkernel.log |grep dissociate

2017-11-09T12:26:40.822Z cpu24:32906)Net: 3348: dissociate dvPort 23922 from port 0x300000a

With this you will get port Id (In this case it is 0x300000a) for which it gets dissociate and you might get multiple entries in vmkernel.log if few more vms are failing to Ping.

Next, run below command and you will receive out put as below.  Last 2 entries clearly shows that dvPort on which this vm was working as got disconnected.

/var/log # cat vmkernel.log |grep 0x300000a

2017-11-09T12:23:37.587Z cpu26:49280)Net: 2312: connected dfwlxplant31.eth0 eth0 to vDS, portID 0x300000a

2017-11-09T12:23:37.587Z cpu26:49280)Net: 3127: associated dvPort 23922 with portID 0x300000a

2017-11-09T12:23:37.591Z cpu26:49280)Team.etherswitch: TeamESPolicySet:5519: Port 0x300000a frp numUplinks 4 active 4(max 4) standby 0

2017-11-09T12:23:37.591Z cpu26:49280)Team.etherswitch: TeamESPolicySet:5527: Update: Port 0x300000a frp numUplinks 4 active 4(max 4) standby 0

2017-11-09T12:23:37.591Z cpu26:49280)NetPort: 1426: enabled port 0x300000a with mac 00:50:56:8d:74:e0

2017-11-09T12:26:39.280Z cpu18:49280)NetPort: 1632: disabled port 0x300000a

2017-11-09T12:26:40.822Z cpu24:32906)Net: 3348: dissociate dvPort 23922 from port 0x300000a

2017-11-09T12:26:40.822Z cpu24:32906)Net: 3354: disconnected client from port 0x300000a

Resolution: Re-initiate a vMotion to another host (Temp solution)

                   Upgrade Firmware and Driver for NIC Card. In my case it was elxnet and after the upgrade it resolved the issue.

Know how to check NIC Firmware and Driver Version.

     1. Take ssh

     2. Run

               # esxcli network nic get -n vmnic1

Check Latest firmware version as per the NIC card and upgrade it. This will resolve your issue