enable ssh and remote in to it and then at the command prompt type esxtop
in esxtop hit the letter "n"
and you will see which vm is at which vmnic
sorry to reply to such an old thread, but we have the exact same problem here.
I stumbled upon your thread while I was googling for this exact problem, because we are experiencing the same issue in our VMware-environment currently.
As I did not find any similar threads:
Have you found a solution to this problem back in 2012 and how did you solve it?
I also suspect the network hardware to be the problem, but of course I need a prove to give to the network admins before they start analyzing on their side...
Would be very nice to hear from you as we are currently in the dark with this phenomenon.
Thanks in advance!
I *just* posted a network issue we are having in 6.5U1 here: Odd Network Issues Since Migrating Environment From vSphere 6U3 > 6.5U1
Can you read my post & see if you are having similar? Not all VMs have a network issue or show as 'down'. I think it's mostly communication than 'down' type stuff. Although, we have had a VM be not pingable (glad this part is rare). What we noticed is this seems to be solely when VMs run on vDS and not vSS. Curious to hear your issue.
We are having a very similar issue like yours.
But we are currently using vSphere 6.0 U3, not 6.5 U1.
We are regularly observing VM's not able to communicate via certain NIC's anymore.
These NICs are always the ones connected to the vDS. The NICs connected to the vSS are always fine.
The version of our vDS is still 5.5. We are currently suspecting that this could be the issue and we plan to upgrade them to 6.0, too.
If that does not solve the issue, we will try to analyze towards pSwitches. As stated somewhere above, other guys found the solution to this problem in faulty MAC table somewhere on pSwitches.
But it will be hard to have enough evidence for the network admins to start their analysis...
It would be good to know how COS resolved his issue way back in 2012.
ssh to the host as root.
run below command
1) "net-stats -l"
capture the port number from the VM.
3) cd net/portsets/switchname/ports/portnumberfromstep1/ (switchname is your virtual switch name- can be standard or DVS)
4) cat teamUplink
this will tell you which uplink the VM is currently using
Thank you for your response! I certainly don't wish you experiencing problems in your environment, but I'm glad to hear someone else experiencing similar issues as us. It's been frustrating I haven't been able to find *anything* on this. I mean seriously.. in the beginning of our 6U3 > 6.5U1 migration, something as SIMPLE as disconnecting a Host from old VCSA & then connecting the Host to new VCSA.. no Host upgrades yet... no VMware Tools upgrades yet... just a new vCenter.. and VMs, though pingable, have their services not working, be it web UI, domain controller services (DNS, directory services), etc. Why would connecting a Host with running VMs that are connected to the Host's vSS cause such an issue? To be fair, rebooting the VMs took care of those initial problems, but certainly wonder why it happened. Then, after I began upgrading Hosts and then connecting them to a vDS (ver 6.5 vDS), that's when we noticed other communication oddities as I mentioned earlier (i.e. Cisco voice server not getting NTP info from Cluster master, occasional VM not pingable, or having latent communication within our network, etc). Don't get it. :/
Since you're having issues on an older version of vSphere, I'd like to ask if you've heard about the issue VMware has/had with their vmxnet3 network adapter? Back a few yrs ago when we upgraded our environment from 5.5 > 6.0, we experienced EXTREME latency on VMs running the vmxnet3 network adapter. We mainly saw this in an "app cluster" (web, app, db VM servers) when each of the VMs were on different Hosts. When we placed the app VMs on the same Host, the latency went away. What we ended up having to do was change the network adapter back to an E1000. I think 6U3 resolved that issue though. This article shares several issues experienced with the vmxnet3, if you haven't seen it: https://vinfrastructure.it/2016/05/several-issues-vmxnet3-virtual-adapter/ . Maybe this is what you all are experiencing? Although, you did say when VMs/Hosts are migrated back to a vSS your environment is fine, so not sure. Just thought I'd share it.
Anyway, thank you for responding. Maybe someone has a suggestion? I haven't received a response on my communities post yet. :/
A happy and prosperous new year to everyone!!
You're welcome . I am also struggeling to get some hints to the root cause of this problem.
I can not imagine that we are affected by the VMXNET3 problem. Because what we experience is the following:
Our VMs are configured like that:
1st interface is for management traffic and is connected to vSS locally on the ESXi servers.
2nd interface is for productive traffic and is connected to a vDS.
The connection-problems, which occur regularly, are ONLY experienced on the 2nd interface for productive traffic.
While the issue occurs, the affected VMs are
- not able to ping their gateway
- not able to reach other hosts in their network, EXCEPT for VMs, which run on the SAME ESXi host
So every traffic, which would leave the ESXi host, is not returned.
So what we are suspecting now of course is that there must be some problem with the distributed switch we are using.
But neither the VMware support, nor my research in the web brought me in this direction. There is no sign of evidence, that this can be related to a distributed switch.
The VMware support told us, that they suspect a problem with MAC tables on some physical switch.
This seems to be a vaild suspicion, but this does not explain, why our problem does not occur with the 1st interface for management traffic.
We did not have a single incident, where the 1st interface was affected. It was, in every single case, the 2nd interface.
If someone does have some idea what could be going on, please share it!
We encountered kind of similar issue this morning.
A Virtual Machine had lost it's network connection.
We could only ping it's own IP address, we didn't try to ping other VM on the same ESX host.
We couldn't ping the Gateway.
We tried to remove the network card, re-create a new one, it didn't solve our issue.
Finally, issue has been solved by connecting the network card to a different port ID on the dvSwitch.
We're opening a case by VMware to understand the root cause of the issue.
I've experienced a similiar problem today.
Multiple VM's lost network connection and we started troubleshooting and found out that the machine could ping themself on the same host but not outside to the network.
The strange thing was, not all machines on that host were affected, just a few of them had this kind of problem and we could resolve a few with disconnecting and reconnecting the nic inside the vm.
But still 2 machines refused to access the machine, so we restarted the first host and the machine worked on that host, but when we moved it back to the original host, it stopped again. So we rebooted the second host and all problems are gone, every machine works on every host again.
I did some more intense troubleshooting on an affected machine and installed wireshark. I could see the machine desperately sending out ARP packages, asking of the mac of the default gw without an answer. But I could also see the default GW sending arp package, asking my machine for it's mac address, which made absolutely no sense to me.
I tripple checked the switch configuration and everything I could check, without finding any misconfiguration.
It just looked to me like ESXI was eating some packages on some machines and only a restart of the host resolved the problem.
Hello COS - where you ever able to get a solution on this? I know it is quite old - but hoping someone can shed some light on this problem. We are experiencing it now and there doesnt seem to be much information on the web about it.
I am experiencing this problem on some new hosts that I'm hoping to migrate to. My virtual standard switch contains four active adapters. Two are using the BNX2 driver and two are using the IGN driver. The Intel adapters with the IGN driver are the ones that seem to fail every 20 days. This is happening on two different servers connected to two different physical switches.
I've never experienced this issue on our old hardware. However, with the old hardware, I have a virtual standard switch with two active adapters instead of four. One of those adapters is also using the BNX2 driver, while the other is using the ne1000 driver.
So, the major differences I see between the two sets of hardware are
- different models of Intel NICs
- Intel NICs on the new hardware support SR-IOV (though not enabled) while no other NIC does
- Virtual switch on server not having issue contains two physical adapter ports, Virtual switch on server four physical adapter ports
My manager wants me to try to get this going without stealing hardware from the old servers to put in the new. He's convinced I have an incorrect setting somewhere. I haven't found the "stop working after 20 days" setting yet though.