1 2 3 4 Previous Next 49 Replies Latest reply on May 21, 2018 7:11 AM by petestone

    VMs intermittent loses network connectivity.

    Johan77 Novice

      Hi,

       

      We have a strange problem/bug in our new VMware cluster.

       

      Environment

      BL460 gen10 with HP FlexFabric 20Gb 2-port 650FLB Adapter

      HPE C7000 chassis

       

      vSphere 6.0 (build 6775062)

      ESX01,ESX03 and ESX03 are in chassi01

      ESX04,ESX05 and ESX are in chassi02

       

      VMs intermittent loses network connectivity. 

      When this happens the “remedy” is to migrate the specific VM to some other host in the cluster.

      So far it seems that it doesn’t matter if I migrate the VM to a VMhost inside the same chassis or to the other chassis , just a migration seems to solve the issue. (I can’t migrate it back to the same host though)

       

      I have around 150 VMs in this cluster and so far I’ve had issues with 5-6 of them , completely random.

      They could be on any of my VMhosts in the cluster.

       

      Haven’t created any support case with VMware or HPE yet , this forum post is my first advance to tackle this problem.

      All firmware is updated to the latest from HPE 

       

      Someone who have seen similar issues?

       

      Regards

      Johan

        • 1. Re: VMs intermittent loses network connectivity.
          hussainbte Expert

          This is an issue with vLAN availability on the hosts the VM is currently running on.

          Or vLAN availability on one of the 2 or more nics you are using for that portgroup.

           

          Please check if the vLAN is available on all the nics the switch uses.

          you can use CDP to discover the same.

           

          or below command form ESXi ssh.

           

          vim-cmd hostsvc/net/query_networkhint

          • 2. Re: VMs intermittent loses network connectivity.
            Johan77 Novice

            Hi hussainbte,

             

            It's not a VLAN availability problem.

             

            We use SUS (Shared uplink set) on our virtual connect switches , and VLAN config is verified both on VC/ServerProfiles and on our juniper switches.

            A VM suddenly loses network connectivity, No vmotion has happened when this occur.

            Like I wrote before the remedy is to migrate the VM to some other host , then after 5-10 minutes its possible to vMotion the VM to its original host.

             

            To me, it sounds like some CAM table somewhere which won't update mac addresses or maybe some bug in the VC switches. Or maybe some garp issue somewhere ...

             

            Regards,

             

            Johan

            1 person found this helpful
            • 3. Re: VMs intermittent loses network connectivity.
              YushkovSergey Novice

              Hi Johan!

              I got the same problem, my setup:

              esxi 6.5 - 6765664

              HP C7000 enclosure with HP VC Flex-10/10D Module

              ProLiant BL460c Gen9 with HP FlexFabric 20Gb 2-port 650FLB Adapter

               

              I updated all hosts from latest spp (Service Pack for ProLiant (SPP) Version 2017.10.1)

              I've opened cases with VMware and HPE, but still no luck. Now we are trying to find right combination of network card firmware and drivers, sound a little bit weird

               

              Can you show output from these commands?

              esxcli software profile get

              esxcli network nic get -n vmnic0

               

              What version of VC do you have?

              I got 4.61 and looks like this can be a cause https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00029108en_us

               

              But to downgrade to 4.5 i have to shutdown whole enclosure, and setup everything from scratch, i hope to find another solution.

              • 4. Re: VMs intermittent loses network connectivity.
                TylerDurden77 Enthusiast

                Hi,

                 

                VC version 4.60 and 4.61 seems to be the problem.  (We run on 4.61)

                 

                We also have a case with HPE , they tell us to downgrade to 4.50

                 

                But like you say , to be able to downgrade it seems that we have to shutdown the whole VC domain which isn't an option for us right now....

                I find it very strange that their isn't a simple way to downgrade the VC.

                 

                Hopefully, HPE will get back to us during the day with some guidance.

                 

                Cheers

                • 5. Re: VMs intermittent loses network connectivity.
                  YushkovSergey Novice

                  It may be, or may be not related to 4.6.0-1

                  I have another VMware cluster in this enclosure, it uses the same virtual distributed switch, and the same uplinks through the same virtual connect modules.

                  But the servers are ProLiant BL460c Gen8, 3 of them with "HP Flex-10 10Gb 2-port 530FLB Adapter" - and there are no issues with virtual machines on them.

                  And one with "HP FlexFabric 20Gb 2-port 650FLB Adapter" like my problem cluster with BL460c Gen9.

                  And guess what? It also has this problem.

                   

                  So the theory about right combination of firmware and driver may be true.

                   

                  These three combination was wrong:

                  Firmware: 11.1.183.62

                  Driver: 11.2.1149.0

                   

                  Firmware: 11.2.1263.19

                  Driver: 11.4.1205.0

                   

                  Firmware: 11.2.1263.19

                  Driver: 11.2.1149.0

                   

                  And now i'm testing (like HPE support was told me)

                  Firmware: 11.1.183.23

                  Driver: 11.1.196.3

                   

                  Can you check you firmware and driver with these commands?

                  esxcli network nic list

                  esxcli network nic get -n vmnic0

                  1 person found this helpful
                  • 6. Re: VMs intermittent loses network connectivity.
                    TylerDurden77 Enthusiast

                    Hi ,

                     

                    Like you I only see the "issues" on gen9 and gen10 servers.

                    Gen8 with HP "FlexFabric 10Gb 2-Port 534FLB Adapter" > No issues   (Have around 12 gen8 servers)

                     

                    Gen9 with HP "FlexFabric 20Gb 2-port 650FLB Adapter" > Some issues, seen VMs acting weird, packet drops etc.

                     

                    Gen10 HP  "FlexFabric 20Gb 2-port 650FLB Adapter"  > Big issues, VMs random loses network connectivity, packet drops.

                     

                     

                     

                    Gen10

                     

                    esxcli network nic list

                    Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description

                    ------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  -----------------------------------------------------------

                    vmnic0  0000:37:00.0  elxnet  Up            Up           10000  Full    70:10:6f:43:84:48  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapter

                    vmnic1  0000:37:00.1  elxnet  Up            Up           10000  Full    70:10:6f:43:84:50  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapter

                    vmnic2  0000:37:00.2  elxnet  Up            Up           10000  Full    70:10:6f:43:84:49  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapter

                    vmnic3  0000:37:00.3  elxnet  Up            Up           10000  Full    70:10:6f:43:84:51  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapter

                    vmnic4  0000:37:00.4  elxnet  Up            Up           10000  Full    70:10:6f:43:84:4a  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapter

                    vmnic5  0000:37:00.5  elxnet  Up            Down             0  Half    70:10:6f:43:84:52  1500  Emulex Corporation HP FlexFabric 20Gb 2-port 650FLB Adapte

                     

                    esxcli network nic get -n vmnic0

                       Advertised Auto Negotiation: true

                       Advertised Link Modes: 1000baseT/Full, 10000baseT/Full, 20000baseT/Full

                       Auto Negotiation: true

                       Cable Type:

                       Current Message Level: 4631

                       Driver Info:

                             Bus Info: 0000:37:00:0

                             Driver: elxnet

                             Firmware Version: 11.2.1263.19

                             Version: 11.2.1149.0

                       Link Detected: true

                       Link Status: Up

                       Name: vmnic0

                       PHYAddress: 0

                       Pause Autonegotiate: true

                       Pause RX: true

                       Pause TX: true

                       Supported Ports:

                       Supports Auto Negotiation: true

                       Supports Pause: true

                       Supports Wakeon: true

                       Transceiver: external

                       Virtual Address: 00:50:56:5f:66:dc

                       Wakeon: MagicPacket(tm)

                     

                     

                     

                     

                    Right now we have evacuated one chassis and downgrading to 4.50  

                     

                    Cheers

                    Johan

                    • 7. Re: VMs intermittent loses network connectivity.
                      YushkovSergey Novice

                      Hi Johan, thank you for information!

                      Please post you results on 4.5, I have to decide what to do next

                       

                      I spend one day on Firmware 11.1.183.23 and on Driver 11.1.196.3 with no errors.

                      • 8. Re: VMs intermittent loses network connectivity.
                        msripada Expert
                        vExpert

                        Try isolating it when the VM is losing network, check the esxtop -> Press N -> find which NIC it is using. If you have multiple NIC's configured for the VM portgroup, try to uncheck the VM network (click ok) and check it back, which then switches the NIC, you can confirm that in the esxtop.

                         

                        If the VM network is working fine, then you can isolate the NIC that way.

                        If you are seeing this same way on multiple hosts then one nic on each host (need to isolate) and check the physical switch configuration to which the NIC's are connected  or try to check if they are same as the other NIC where the VM running.

                         

                        Thanks,

                        MS

                        • 9. Re: VMs intermittent loses network connectivity.
                          TylerDurden77 Enthusiast

                          Hi Sergey,

                           

                          We have 4 chassis

                           

                          In each chassis we have:

                          3 gen10  servers.

                          10 gen9  servers.

                          3 gen8 servers.

                           

                          All blades except 2 are ESXi 6.0u3 hosts.

                           

                          In our case it feels like the problem escalated somehow when we took the gen10 server into production. But we are not certain...

                           

                          To try to pinpoint the problem we have now done the following:

                           

                          In chassis 1 and 2 we have downgraded the CNA firmware/driver (your hint) on the gen10 ESXi hosts, VC firmware is still 4.61

                          In chassis 3 and 4 we have put the gen10 servers into maintenance mode. VC firmware is downgraded to 4.50

                           

                          After we downgraded the VC firmware in C3 and C4 we still had issues. (Random packet loss on VMs running in C3 and C4)

                          But after we downgraded the CNAs on the gen10 blades in C1 and C2 we havent seen any issues and our environment seems stable. ( Only 8 hours now though)

                           

                          It's a very strange problem,  hard to troubleshoot, so intermittent.

                           

                          How are things in your environment? Still good after the downgrade?
                          Do you have any types of loadbalancers?  (Wonder if our F5s could have something to do with the problem)

                           

                          Cheers

                          Johan

                          • 10. Re: VMs intermittent loses network connectivity.
                            YushkovSergey Novice

                            Hi Johan, thank you for sharing results with vc 4.5.

                            I have not seen any issue for 50+ hours with CNA firmware: 11.1.183.23 and driver: 11.1.196.3

                             

                            We dont have any load balancers it this configuration and also we don't have any G10 servers yet.

                             

                            During troubleshooting i try to simplify everything as possible. So right now our configuration looks like this:

                            bay 1 - HP VC Flex-10/10D Module

                            bay 2 - HP VC Flex-10/10D Module

                             

                            Two SUS, each with only one physical uplink (no LACP). Every uplink is a trunk, so there is a bunch of vlans in it.

                             

                            profile attached to esxi server:

                            vlans from sus uplink1 goes to port 1, and from uplink 2 to port 2

                             

                            On distributed switch we have distributed port group called "servers", all problem virtual machines attached to it. All traffic goes through one uplink "servers01" which points to vmnic0 on every esxi server. So no load balancing here to.

                             

                             

                             

                            Our issues started after we replace or old virtual connect modules with new HP VC Flex-10/10D Modules (and update it to 4.61 from very beginning), and add new G9 servers to the enclosure (and update them from latest SPP). We do all this as one step, that's why i'm uncertain what to blame VC or CNA

                             

                            Hope that right cna firmware/driver will help us.

                            • 11. Re: VMs intermittent loses network connectivity.
                              TylerDurden77 Enthusiast

                              Hi Sergey,

                               

                              We are pretty sure that we have pinpointed the "bug"

                              Has nothing to do with our new gen10 blades and it's not the VC firmware.

                              It's the CNA firmware (11.2.1263.19) from the Okt SPP. 

                               

                              Done a lot of testing and we can reproduce the problem on VMhosts with the "11.2.1263.19" firmware. (Both on VC 4.50 and 4.61)

                               

                              We have now downgraded the CNA firmware to "11.1.183.62" and our environment is stable again.

                               

                              I find it very strange that HPE doesn't know about this problem, must be many customers around the world who have issues like we did.

                               

                              Cheers

                              Johan

                              • 12. Re: VMs intermittent loses network connectivity.
                                glamic26 Novice

                                Hello all,

                                 

                                Just wanted to add to the investigation here as we are seeing the same issue of VMs intermittently dropping off the network and the fix being to vMotion the VMs to another host.

                                 

                                We are running a similar setup:

                                6x BL460C Gen10 blades with 650FLB adapters

                                C7000 chassis with FlexFabric 20/40 F8 modules

                                 

                                vSphere 6.0 (build 6921384)

                                ESX01, 02 and 03 are in chassis01

                                ESX04, 05 and 06 are in chassis02

                                 

                                vDS version 6.0

                                PortGroup Settings:

                                Promiscuous Mode: Reject

                                MAC Address Changes: Accept

                                Forged Transmits: Accept

                                Load Balancing: route based on physical NIC load

                                Network Failover Detection: Beacon Probing

                                Notify Switches: Yes

                                Failback: Yes

                                dvUplink1 and dvUplink2 both Active Uplinks

                                 

                                The VCs are firmware version 4.50 (we previously downgraded this because 4.60 and 4.61 were revoked by HPE)

                                 

                                We first started seeing this issue with all 6 hosts running

                                Firmware Version: 11.1.183.62 (having to use older firmware due to an issue with recovering from a fibre cable loss on newer firmware - host unable to see paths to storage again even after fibre cable replaced until the host was rebooted)

                                Driver Version: 11.2.1149.0

                                 

                                We have since upgraded two of the hosts to firmware version 11.2.1263.19 but have had repeat issues with VMs on these hosts so this hasn't fixed the issue.

                                 

                                So to re-clarify some of the suggestions on here and cover them off:

                                VC firmware downgrade to 4.50 doesn't fix the issue

                                Firmware version 11.1.183.62 doesn't fix the issue (with driver 11.2.1149.0)

                                Firmware version 11.2.1263.19 does fix the issue (with driver 11.2.1149.0)

                                 

                                I'll be logging this with VMware and HPE today.  Does anyone else have any other open cases with them that I could reference to improve our chances of finding a fix?

                                 

                                There are suggestions in other posts (with not so similar hardware setup) that the issue is likely to be with the MAC address tables on the physical switches.  Because we have Notify Switches turned on on the vDS PortGroups when a vMotion completes it notifies switches to update their MAC Address tables and this fixes the issue.  So possibly somehow the Physical Switches are losing the correct MAC address for the IP address of the VM and the vMotion fixes this by notifying the switches of the MAC address.

                                 

                                Thanks,

                                 

                                glamic26

                                • 13. Re: VMs intermittent loses network connectivity.
                                  TylerDurden77 Enthusiast

                                  Hi,
                                  Just a quick update.

                                   

                                  On our gen10 servers, we have this setup which has been stable for the last 36 hours.

                                   

                                   

                                  @glamic26

                                  You say that you have seen problems with "11.1.183.62" ? (with driver 11.2.1149.0)

                                   

                                  Cheers

                                  Johan

                                  • 14. Re: VMs intermittent loses network connectivity.
                                    quinny100 Lurker

                                    Any updates from HPE on this?

                                     

                                    I think we're experiencing the same issue.  2 C7000 enclosures, 24 blades with Virtual Connect modules and 650FLB NIC's in the blades.  ESXi 6, VM's will randomly drop off the network and come back when vMotioned.

                                     

                                    Rebooting the hosts seems to make the issue go away for a while - last time we didn't see it for about 20 days after rebooting all the blades. 

                                     

                                    We are currently downgrading firmware on the NIC's to see if this helps.

                                    1 2 3 4 Previous Next