Define "acts like it is no longer on the network"? If you can ping to/from the vm, it's on the network. Maybe you are having a name resolution issue, DNS?
From within the VM, I can not ping anything but itslef, localhost. There's no reply when pinging the Def Gateway. Name resolution to anything else on the network obviously fails for that reason.
Now from outside the VM, other VM's on the same subnet and same DV Switch, there is no reply from either pinging that VM's DNS name or the IP address. The VM's on the same subnet and same DV Switch are live on the network, meaning they respond to pings and can athenticate to all resources on the network.
So it's acting like it is not on the network.
Again, if I vmotion it to another host, it starts to reply to pings on the network and *IS* on the network.
Now for kick's, I vmotioned it back onto the previous host it was on (when it was not on the network) and it's still on the network.......lol, weird, I know!!!
I can't seem to replicate it. We find out when customers call in saying thier VM is no longer alive.
Let me also add what we did to troublesooht....
We diconnected the NIC then reconnected it, Failed
Set the VM to DHCP then put back the static IP. Failed
We rebooted the VM, Failed
We removed the NIC from the VM, booted it up, removed the ghosted NIC device then added the NIC back, Failed
Restored the VM from a snapshot, Failed
Seems the only thing that works is vmotioning it to another host.
I can also report that
Realtek 8168 Gigabit Ethernet
Cannot connect to the specified gateway 192.168
.1.1. Failed to set it.
9/30/2012 11:48:13 AM
Lost network connectivity on virtual switch
"vSwitch0". Physical NIC vmnic0 is down.
Affected portgroups:"Management Network".
9/30/2012 11:48:13 AM
i disable the virtual NIC and re enable and it came back .
Sorry, I misread your original post. So we had a similar problem many moons ago, but I'm not sure it's relevant. Are you using different port groups and DRS? Are the vm's being moved around when this occurs? If so, do you have the same port groups on all ESXi hosts? What was happening with us was that occasionally a vm would be vmotioned from one host to another, but the new host did not have the correct port group/vlan so it would lose connectivity until we either created the necessary port group or vmotioned it back to a host that had the correct port group.
All hosts in the cluser have the appropriate port groups. They're all profiled.
It's just weird that when I vmotion off of the current host, say "Host10" to another host, say "host20", the network comes back online. So I vmotioned it back from "Host20" back to the original host "Host10", the network stays online. It's like networking for the affected VM get's "Hung" untill it's vmotioned.
Yes DRS is enabled and VM's move dynamically.
So if the port groups/vlan were incorrect, the network on that VM should go offline when I vmotion it back to the original host. But that's not what happens.
.....scratchin my head on this one.....lol
I Suspect the issue is with the Physical Switch, please follow the below steps (it may not resolve the issue but sure it will give idea to resolve ).
1. When the issue occurs , i.e when you are not able to ping the VM , Check the phyiscal switch mac table and see if you are seeing the MAC of the NIC
2. Also at the time of the issue , try pining the other VMs in the same port group in same ESX host.
3. If you are able to ping VMs within host and port group then need to check in the phyiscal switch
4. If you are not able to ping the VM within ESX host then need to re-validate the configuration.
I hope this will sort out.Karthic Kumar,
Sr.MTS. vRealize Network Insight.
Agree with karthickvm
Sounds like a physical switch issue to me as well. Main reason - when VM is migrated via vMotion, one of the last steps of migration is that destination host sends out a request for a physical switch to update it's MAC tables (basically host is telling pSwitch that VMs MAC address will now be living on port attached to destination host). If you say that VM gets access to network after migration, seems that problem is resolved when MAC tables are updated on pSwitch.
I'd probably ask my network admin to take a look at pSwitch.
Other things to check:
Are all VMs affected or only some? If only some, are there any signs of MAC conflicts (like log entries on Guest, duplicate MAC errors on pSwitch) anywhere?
Which pNIC load balancing policy are you using? IP-Hash in some cases might show similar symptoms if pSwitches are not etherchannel capable.
I had that exact same problem with ESX 4/4.1 and what it turned out to be was one of our core switches just acting wacky. We had Dell/Cisco/VMware all working with us.. no one could figure it out, and one day we decided to reboot our core switches.. and the problem went away as mysteriously as it showed up.. we didn't have physical wiring issues or anything. I think over time there's some sort of a buildup of something that can cause this situation, but thats entirely a guess.. I'd say, if you can just reload all your physical switches.
here's the link to my previous post:
Still working on the issue. Were making some changes on the hosts. I'll post what we did and the results when were done....
We're running into the EXACT same issue with only one of the hosts in a five-node cluster. Running ESXi 5.0, 469512 on a Dell PowerEdge R610. When it happens, it doesn't happen to all the guests. This morning I had two guests on there, and only one lost it's network connectivity. Changing it to another network doesn't help, but like COS said, if we VMotion it to another host, network comes back. And if we VMotion it back to the "bad host" the network stays connected.
I haven't noticed that this happens when we do any particular thing, but this latest occurrence, I storage migrated the VM to another datastore. The migration finished at 4:06pm and we started getting ping failures directly after. So I VMotioned it to another host, and back to the bad one, and it's happy as could be (for now).
Please let me know if you guys find anything!
I checked the CDP (Cisco Discovery Protocol) information from both NICs in the Configuration tab, under Networking. Compared the info from the "bad" host, to an unaffected host in the same cluster. Found ONE of the NICs on the bad host is in a different VLAN.
I have a ticket opened for our swtich guys to check it out. It would make sense that if only one NIC is configured improperly on the host, that only some of the guests might be trying to use that NIC, while the others are humming along just fine on the properly configured NIC.
I'll let you know what happens, but this seems to be the smoking gun.
When we faced this issue, we shut down one of the 2 NIC's on the hosts. Basically when you have the nic's teamed vsphere would choose based on port ID and load "I think" on which nic to send network traffic.
On Host A, VM1 would be on vmnic0. Lets say you loose connectivity here and you decide Vmotion to Host B and VM1 comes up on vmnic1. You will likely want to blame that host for bad connectivity.
I was able to track down this behaviour because at the time we still had console access and access to esxtop.. I haven't tried with 5.0 but I imagine if you enable console esxtop would still be there.. anyway.. the test here would be to disable vmnic0 and force everything to go to vmnic1 on that Host A. and if things come back to life then you have either one of 2 things..
1. either a bad network segment with a bad access switch or a bad core switch on that segment
2. bad cable on vmnic 0
3. bad vmnic 0
Either way in my case we had a bad core switch on one segment that affected all vmnic0 on all hosts. And that point we had rebooted all of our switches and it cleared the issue so we were not able to pinpoint exactly the behaviour if we had only rebooted the one bad switch. We consequently had it replaced a few weeks later.
That's a good approach alvinswim. Our switch guys wrote back and said that indeed, one of the ports on the physical switch was only set to look at one of our supported VLANs. So he added the other, and so far, so good.
I'm not looking at my Vcenter right now... is there a way to find out what host NIC a guest is using at the time?