9 Replies Latest reply on May 19, 2020 7:29 AM by andreir

    NSX-T 2.5.1 - no geneve tunnels

    andreir Novice

      Hi all,

       

      hitting a really puzzling issue: configured the latest NSX-T on Cisco UCS, created T0, overlay TZ, created a segment and added two VMs. VMs are able to ping the gateway on T0, can ping each other if on the same host, but cannot ping each other if on different hosts. Upon closer inspection, it appears that no tunnels are formed between the ESXi nodes.

       

      I'm able to ping between TEPs with large MTU, so no networking issues as far as I can see, but the tunnels are not formed... BFD shows tunnels are down (please see the output on the bottom).

       

      Not seeing any related error messages in /var/log/vmkernel or /var/log/nsx-syslog.log on the hosts.

       

      Anything else I can check? Would be happy to provide any other output. Please help!!!

       

       

      Tested VXLAN connectivity and it looks good:

       

      [root@NSX02:~] ping ++netstack=vxlan 10.12.0.151 -s 1600 -d

      PING 10.12.0.151 (10.12.0.151): 1600 data bytes

      1608 bytes from 10.12.0.151: icmp_seq=0 ttl=64 time=0.271 ms

       

      Checked the logical switches on the hosts and they look good (the switch I'm using is called "test"):

       

      NSX-Manager> get logical-switch

      VNI     UUID                                  Name                                              Type

      71688   5cce3073-c5c9-4cf6-9cad-8db50dd06b68  OV-WEB                                            DEFAULT

      71689   8208b2cd-7d0c-407e-aacf-ee9297ef5cf2  OV-DB                                             DEFAULT

      71691   fedd3ec3-d3e4-4d02-ac4f-cd94bde02fdf  transit-bp-2a5f80db-676d-41f4-b305-1e8591266f94   TRANSIT

      71692   c9b96c71-ebff-4572-88a9-7639d2923743  transit-bp-8871e348-42da-447f-9193-70781b09730f   TRANSIT

      71690   50db354a-bf9c-483f-9637-c397e78d05b7  transit-rl-8871e348-42da-447f-9193-70781b09730f   TRANSIT

      71681   97655bd6-dd20-4746-8138-656a0c06e9b0  test                                              DEFAULT

      71687   6fa865f8-4bb6-439a-a428-a94e27e02090  OV-APP                                            DEFAULT

       

      [root@NSX02:~] nsxcli -c get logical-switch 71681 vtep-table

                                         Logical Switch VTEP Table

      -----------------------------------------------------------------------------------------------

       

                                             Host Kernel Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID     Is MTEP       VTEP MAC       BFD count

      124941    10.12.0.151        10.12.0.128     False  00:50:56:67:31:cb   0

       

                                             LCP Remote Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID          VTEP MAC                  DEVICE NAME

      124941    10.12.0.151        10.12.0.128     00:50:56:67:31:cb                 None

       

                                              LCP Local Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID          VTEP MAC                  DEVICE NAME

      124942    10.12.0.152        10.12.0.128     00:50:56:63:b0:56                 None

       

      [root@NSX03:~] nsxcli -c get logical-switch 71681 vtep-table

                                         Logical Switch VTEP Table

      -----------------------------------------------------------------------------------------------

       

                                             Host Kernel Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID     Is MTEP       VTEP MAC       BFD count

      124942    10.12.0.152        10.12.0.128     False  00:50:56:63:b0:56   0

       

                                             LCP Remote Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID          VTEP MAC                  DEVICE NAME

      124942    10.12.0.152        10.12.0.128     00:50:56:63:b0:56                 None

       

                                              LCP Local Entry

      ===============================================================================================

      Label      VTEP IP           Segment ID          VTEP MAC                  DEVICE NAME

      124941    10.12.0.151        10.12.0.128     00:50:56:67:31:cb                 None

       

       

      Checked BFD sessions, tunnels down, no diagnostic....

       

      [root@NSX03:/var/log]  net-vdl2 -M bfd -s nvds

      BFD count:      3

      ===========================

      Local IP: 10.12.0.151, Remote IP: 10.12.0.153, Local State: down, Remote State: down, Local Diag: No Diagnostic, Remote Diag: No Diagnostic, minRx: 1000, isDisabled: 0, l2SpanCount: 1, l3SpanCount: 1

      Roundtrip Latency: NOT READY

      VNI List: 71687

      Routing Domain List: 8871e348-42da-447f-9193-70781b09730f

      Local IP: 10.12.0.151, Remote IP: 10.12.0.200, Local State: down, Remote State: down, Local Diag: No Diagnostic, Remote Diag: No Diagnostic, minRx: 1000, isDisabled: 0, l2SpanCount: 3, l3SpanCount: 2

      Roundtrip Latency: NOT READY

      VNI List: 71690 71691   71692

      Routing Domain List: 2a5f80db-676d-41f4-b305-1e8591266f94       8871e348-42da-447f-9193-70781b09730f

      Local IP: 10.12.0.151, Remote IP: 10.12.0.152, Local State: down, Remote State: down, Local Diag: No Diagnostic, Remote Diag: No Diagnostic, minRx: 1000, isDisabled: 0, l2SpanCount: 2, l3SpanCount: 2

      Roundtrip Latency: NOT READY

      VNI List: 71681 71688

      Routing Domain List: 2a5f80db-676d-41f4-b305-1e8591266f94       8871e348-42da-447f-9193-70781b09730f

        • 1. Re: NSX-T 2.5.1 - no geneve tunnels
          daphnissov Guru
          Community WarriorsvExpert

          From the NSX-T Manager UI, do a Traceflow using the ICMP protocol from VM1 to VM2 when they are across hosts. What is the result? Post the screenshot.

          • 2. Re: NSX-T 2.5.1 - no geneve tunnels
            andreir Novice

             

            Zooming in:

            • 3. Re: NSX-T 2.5.1 - no geneve tunnels
              daphnissov Guru
              vExpertCommunity Warriors

              Has some event happened to these ESXi hosts after they were initially prepared with the NSX-T bits? Asked more directly, has this *ever* worked or no? Are these nested ESXi hosts? Have you tried to reboot each of them?

              • 4. Re: NSX-T 2.5.1 - no geneve tunnels
                andreir Novice

                These are physical UCS blades, fresh install as far as I know so nothing should've happened on the hosts. The NSX-T never worked correctly after it was set up. Will try to reboot.

                • 5. Re: NSX-T 2.5.1 - no geneve tunnels
                  andreir Novice

                  Reboot did not help, unfortunately.

                   

                  I also did a packet capture while doing traceflow, and it looks like ICMP packet is received by the destination host:

                   

                  on source (nsx02 host):

                  [root@NSX02:~] nsxcli -c start capture interface vmnic5 direction output expression dstip 10.12.0.151

                   

                  01:30:24.292172 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:25.292190 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:25.606948 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 186: 10.12.0.152.49168 > 10.12.0.151.6081: Geneve, Flags [C], vni 0x11801, proto TEB (0x6558), options [8 bytes]: 00:50:56:b7:85:57 > 00:50:56:b7:c8:2e, ethertype IPv4 (0x0800), length 128: 10.12.67.172 > 10.12.67.10: ICMP echo request, id 0, seq 0, length 94

                  01:30:26.192260 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:27.092236 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:5

                   

                  on destination (esx03 host):

                  [root@NSX03:~] nsxcli -c start capture interface vmnic5 direction input expression srcip 10.12.0.152

                   

                  01:30:24.278450 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:25.278480 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:25.593242 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 186: 10.12.0.152.49168 > 10.12.0.151.6081: Geneve, Flags [C], vni 0x11801, proto TEB (0x6558), options [8 bytes]: 00:50:56:b7:85:57 > 00:50:56:b7:c8:2e, ethertype IPv4 (0x0800), length 128: 10.12.67.172 > 10.12.67.10: ICMP echo request, id 0, seq 0, length 94

                  01:30:26.178576 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                  01:30:27.078539 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 116: 10.12.0.152.54710 > 10.12.0.151.6081: Geneve, Flags [O], vni 0x0, proto TEB (0x6558): 00:50:56:63:b0:56 > 00:50:56:67:31:cb, ethertype IPv4 (0x0800), length 66: 10.12.0.152.49152 > 10.12.0.151.3784: BFDv1, Control, State Down, Flags: [Poll], length: 24

                   

                   

                  But the traceflow does not show that it was delivered on the other side?

                  I'll be happy to provide full packet captures if that might help...

                  • 6. Re: NSX-T 2.5.1 - no geneve tunnels
                    mbangouraAnect Novice

                    Any luck solving this? I think that we have the same problem

                     

                    Seems like Cisco VIC has a problem with decapsulating geneve packets upon arrival...

                     

                    Any hint is appreciated.

                    • 7. Re: NSX-T 2.5.1 - no geneve tunnels
                      andreir Novice

                      Unfortunately we were not successful in resolving this issue. The original problem was observed on Cisco UCS B200 M2 blades running 2.2(8i) firmware and M81KR CNA with 2.2(3b). Since those are no longer supported on ESXi 6.7, we gave up. Best guess is that the VIC driver somehow mangles the encapsulated packet.

                      • 8. Re: NSX-T 2.5.1 - no geneve tunnels
                        mbangouraAnect Novice

                        The problem is that NSX-T TN offloads IP checksum calculations by default to HW (UCS VIC - M81KR CNA firmware). Unfortunately, CNA from some reason can't calculate correct outer IP checksum for Geneve encapsulated packets. So incoming Geneve packets from TZ A to TZ B are received on the uplink interface of TZ B but with back IP checksum (outer, inner Geneve IP checksum is OK), therefore they are discarded by the system.

                         

                        One can verify this by capturing incoming packets on TN via nsxcli: start capture interface _uplink1_ direction input file xyz.pcap. Upon transferring the xyz.pcap file from /tmp/ (via winscp or other utility) and loading the xyz.pcap to Wireshark, outer geneve packet IP checksums will be incorrect (turn on Protocol prefs: Validate the IPv4 checksums...).

                         

                        There is almost none to zero chance that Cisco will fix that for old M81KR CNA, therefore this must be tweaked on ESXi side...

                         

                        Workaround: turn off IP checksum HW offloading for all NSX-T vmnics on all TNs using Cisco VICs (in this case vmnicX-Y):

                         

                        esxcli network nic software set --ipv4cso=1 -n vmnicX
                        esxcli network nic software set --ipv4cso=1 -n vmnicY
                        

                         

                        Parameter --ipv4cso=1 means IP checksum is done in SW, --ipv4cso=0 that IP checksum is HW offloaded.

                        Settings are reboot persistent.

                         

                        To verify that IP checksum calculations are done in SW (vmkernel) run:

                        esxcli network nic software list
                        

                         

                        IPv4 CSO = on means IP checksum in SW.

                         

                        Upon activating IP checksum in SW for NSX-T vmnics Geneve uplinks should go UP instantly (to verify run "nsxdp-cli bfd sessions list").

                         

                        PS: It seems if you are testing Nested ESXi deployement which uses vmxnet3 with enabled DirectPath I/O same workaround must be applied to virtual vmxnet3 vmnics if they are bound with Cisco VICs (vmxnet3 offloads IP checksum calculations to VIC?).

                         

                        Regarding performance concerns with SW IP checksum calculation: VM to VM throughput is similar (VMs residing on different B200 M1 blades):

                        - 9.67 Gbits/sec with DSwitch vs. 9.13 Gbits/sec with NSX-T SDN.

                        - NSX-T DR L3 routing: 8.04 Gbits/sec.

                         

                        With this workaround were have successfully tested both NSX-T 2.5 and 3.0 using:

                        - Cisco B200 M1 blades with M81KR CNA/VIC in 5108 blade chassis

                        - FI 6100 with UCSM 2.2(8i)

                        - ESXi 6.5u3

                        (edge nodes must be on different cluster - newer servers due to AS-NI CPU requirement)

                         

                        IMHO newer VIC cards like VIC1200 / VIC1300 have/had similar problems with Geneve packets, because previously we were unable to run NSX-T 2.4 on C240-M4 using VIC1300 (geneve tunnels down).

                         

                        Lastly, I can confirm that NIC HW offloading of Geneve encapsulation is not a requirement for NSX-T 3.0.

                        • 9. Re: NSX-T 2.5.1 - no geneve tunnels
                          andreir Novice

                          This is great, thank you very much! I can confirm that disabling the checksum offload fixes the issue.