1 2 Previous Next 25 Replies Latest reply on May 13, 2019 11:34 AM by jgyles

    VM's lose network connectivity randomly

    COS Master

      We have been experiencing some VM's losing network connectivity sporadically. The VM's stay online for a while then suddenly it acts like it is no longer on the network. Everything appears to be correct. The vNICs are connected and you can get to the console. I can't ping it from outside the host and the VM can't ping it's Def Gateway.

       

      Hardware is 4 HP Gen 8 LFF with Quad port NIC's HP NC364T running ESXi 5 U1 with vCenter Server clustered.

       

      Anyone experience this?

      If I vmotion it to another host it comes back online. That's been our temporary solution.

       

      Thanks

        • 1. Re: VM's lose network connectivity randomly
          dkraut Hot Shot

          Define "acts like it is no longer on the network"?  If you can ping to/from the vm, it's on the network.  Maybe you are having a name resolution issue, DNS?

          • 2. Re: VM's lose network connectivity randomly
            COS Master

            From within the VM, I can not ping anything but itslef, localhost. There's no reply when pinging the Def Gateway. Name resolution to anything else on the network obviously fails for that reason.

             

            Now from outside the VM, other VM's on the same subnet and same DV Switch, there is no reply from either pinging that VM's DNS name or the IP address. The VM's on the same subnet and same DV Switch are live on the network, meaning they respond to pings and can athenticate to all resources on the network.

             

            So it's acting like it is not on the network.

             

            Again, if I vmotion it to another host, it starts to reply to pings on the network and *IS* on the network.

            Now for kick's, I vmotioned it back onto the previous host it was on (when it was not on the network) and it's still on the network.......lol, weird, I know!!!

             

            I can't seem to replicate it. We find out when customers call in saying thier VM is no longer alive.

            • 3. Re: VM's lose network connectivity randomly
              COS Master

              Let me also add what we did to troublesooht....

              We diconnected the NIC then reconnected it, Failed

              Set the VM to DHCP then put back the static IP. Failed

              We rebooted the VM, Failed

              We removed the NIC from the VM, booted it up, removed the ghosted NIC device then added the NIC back, Failed

              Restored the VM from a snapshot, Failed

               

              Seems the only thing that works is vmotioning it to another host.

              • 4. Re: VM's lose network connectivity randomly
                technobro1 Novice

                I can also report that

                 

                 

                Realtek 8168 Gigabit Ethernet

                 

                Cannot connect to the specified gateway 192.168
                .1.1. Failed to set it.
                error
                9/30/2012 11:48:13 AM
                localhost.localdomain

                 

                Lost network connectivity on virtual switch
                "vSwitch0". Physical NIC vmnic0 is down.
                Affected portgroups:"Management Network".
                error
                9/30/2012 11:48:13 AM
                localhost.localdomain

                 

                i disable the virtual NIC and re enable and it came back .

                 

                Win 64b

                ESX 5.1

                • 5. Re: VM's lose network connectivity randomly
                  dkraut Hot Shot

                  Sorry, I misread your original post.  So we had a similar problem many moons ago, but I'm not sure it's relevant.  Are you using different port groups and DRS?  Are the vm's being moved around when this occurs?  If so, do you have the same port groups on all ESXi hosts?  What was happening with us was that occasionally a vm would be vmotioned from one host to another, but the new host did not have the correct port group/vlan so it would lose connectivity until we either created the necessary port group or vmotioned it back to a host that had the correct port group.     

                  • 6. Re: VM's lose network connectivity randomly
                    COS Master

                    All hosts in the cluser have the appropriate port groups. They're all profiled.

                     

                    It's just weird that when I vmotion off of the current host, say "Host10" to another host, say "host20", the network comes back online. So I vmotioned it back from "Host20" back to the original host "Host10", the network stays online. It's like networking for the affected VM get's "Hung" untill it's vmotioned.

                     

                    Yes DRS is enabled and VM's move dynamically.

                     

                    So if the port groups/vlan were incorrect, the network on that VM should go offline when I vmotion it back to the original host. But that's not what happens.

                     

                    .....scratchin my head on this one.....lol

                    • 7. Re: VM's lose network connectivity randomly
                      karthickvm Hot Shot
                      vExpertVMware Employees

                      Hello COS,

                       

                      I Suspect the issue is with the Physical Switch, please follow the below steps (it may not resolve the issue but sure it will give idea to resolve ).

                       

                       

                      1. When the issue occurs , i.e when you are not able to ping the VM , Check the phyiscal switch mac table and see if you are seeing the MAC of the NIC

                      2. Also at the time of the issue , try pining the other VMs in the same port group in same ESX host.

                      3. If you are able to ping VMs within host and port group then need to check in the phyiscal switch

                      4. If you are not able to ping the VM within ESX host then need to re-validate the configuration.

                       

                      I hope this will sort out.

                      Karthic Kumar,
                      Sr.MTS. vRealize Network Insight.
                      • 8. Re: VM's lose network connectivity randomly
                        kermic Expert

                        Agree with karthickvm

                         

                        Sounds like a physical switch issue to me as well. Main reason - when VM is migrated via vMotion, one of the last steps of migration is that destination host sends out a request for a physical switch to update it's MAC tables (basically host is telling pSwitch that VMs MAC address will now be living on port attached to destination host). If you say that VM gets access to network after migration, seems that problem is resolved when MAC tables are updated on pSwitch.

                         

                        I'd probably ask my network admin to take a look at pSwitch.

                         

                        Other things to check:

                        Are all VMs affected or only some? If only some, are there any signs of MAC conflicts (like log entries on Guest, duplicate MAC errors on pSwitch) anywhere?

                        Which pNIC load balancing policy are you using? IP-Hash in some cases might show similar symptoms if pSwitches are not etherchannel capable.

                         

                        WBR

                        Imants

                        • 9. Re: VM's lose network connectivity randomly
                          alvinswim Hot Shot

                          I had that exact same problem with ESX 4/4.1 and what it turned out to be was one of our core switches just acting wacky. We had Dell/Cisco/VMware all working with us.. no one could figure it out, and one day we decided to reboot our core switches.. and the problem went away as mysteriously as it showed up.. we didn't have physical wiring issues or anything. I think over time there's some sort of a buildup of something that can cause this situation, but thats entirely a guess.. I'd say, if you can just reload all your physical switches.

                           

                          here's the link to my previous post:

                           

                          http://communities.vmware.com/thread/319531

                           

                           

                          good luck..

                          • 10. Re: VM's lose network connectivity randomly
                            COS Master

                            Still working on the issue. Were making some changes on the hosts. I'll post what we did and the results when were done....

                             

                            Thanks everyone!

                            • 11. Re: VM's lose network connectivity randomly
                              OB_Juan Lurker

                              We're running into the EXACT same issue with only one of the hosts in a five-node cluster.  Running ESXi 5.0, 469512 on a Dell PowerEdge R610. When it happens, it doesn't happen to all the guests.  This morning I had two guests on there, and only one lost it's network connectivity.  Changing it to another network doesn't help, but like COS said, if we VMotion it to another host, network comes back.  And if we VMotion it back to the "bad host" the network stays connected.

                               

                              I haven't noticed that this happens when we do any particular thing, but this latest occurrence, I storage migrated the VM to another datastore.  The migration finished at 4:06pm and we started getting ping failures directly after.  So I VMotioned it to another host, and back to the bad one, and it's happy as could be (for now).

                               

                              Please let me know if you guys find anything!

                              • 12. Re: VM's lose network connectivity randomly
                                OB_Juan Lurker

                                I checked the CDP (Cisco Discovery Protocol) information from both NICs in the Configuration tab, under Networking.  Compared the info from the "bad" host, to an unaffected host in the same cluster.  Found ONE of the NICs on the bad host is in a different VLAN.

                                 

                                I have a ticket opened for our swtich guys to check it out.  It would make sense that if only one NIC is configured improperly on the host, that only some of the guests might be trying to use that NIC, while the others are humming along just fine on the properly configured NIC.

                                 

                                I'll let you know what happens, but this seems to be the smoking gun.

                                • 13. Re: VM's lose network connectivity randomly
                                  alvinswim Hot Shot

                                  When we faced this issue, we shut down one of the 2 NIC's on the hosts. Basically when you have the nic's teamed vsphere would choose based on port ID and load "I think" on which nic to send network traffic.

                                   

                                  For example:

                                   

                                  On Host A, VM1 would be on vmnic0. Lets say you loose connectivity here and you decide Vmotion to Host B and VM1 comes up on vmnic1. You will likely want to blame that host for bad connectivity.

                                   

                                  I was able to track down this behaviour because at the time we still had console access and access to esxtop.. I haven't tried with 5.0 but I imagine if you enable console esxtop would still be there.. anyway.. the test here would be to disable vmnic0 and force everything to go to vmnic1 on that Host A. and if things come back to life then you have either one of 2 things..

                                   

                                  1. either a bad network segment with a bad access switch or a bad core switch on that segment

                                  2. bad cable on vmnic 0

                                  3. bad vmnic 0

                                   

                                  Either way in my case we had a bad core switch on one segment that affected all vmnic0 on all hosts. And that point we had rebooted all of our switches and it cleared the issue so we were not able to pinpoint exactly the behaviour if we had only rebooted the one bad switch. We consequently had it replaced a few weeks later.

                                  • 14. Re: VM's lose network connectivity randomly
                                    OB_Juan Lurker

                                    That's a good approach alvinswim.  Our switch guys wrote back and said that indeed, one of the ports on the physical switch was only set to look at one of our supported VLANs.  So he added the other, and so far, so good.

                                     

                                    I'm not looking at my Vcenter right now... is there a way to find out what host NIC a guest is using at the time?

                                    1 2 Previous Next