VMware NSX

 View Only
  • 1.  Communication Problems - EDGE gateway and DLR

    Posted Aug 23, 2016 03:54 PM
      |   view attached

    Hello community I hope everyone is fine.


    I’m a vSphere & NSX rookie (actually I have networking and security background, so everything involving VMWare it’s very new to me) anyway, I have an NSX Version:6.1.3 deployment and I’m experiencing a communication problem between Edge GW and DLR.

    I’m creating a very simple deployment: Network-A & Network-B communicated thru a DLR (picture) and everybody ping each other below the DLR, no problem there. The VMs ping each other in vDS1 and ping the VM in vDS2 and viceversa.

    The issue, is in the “Edge GW” that simply do not communicate properly with DLR (sometimes ON, sometimes OFF the ping between them) and no communications from the VMs to the outside world what so ever. I have disable all firewalls in the EDGE and the DLR FYi.

    So I followed a set of instructions (here) (I was redirected thru this article) and a simple workaround was to reset the netcpad daemon with the following command:

    ~ # /etc/init.d/netcpad stop  &   ~ # /etc/init.d/netcpad start

    and I’m getting very weird replies when I try this command:

    vcenter:~ # /etc/init.d/netcpad stop
    bash: /etc/init.d/netcpad: No such file or directory

    and then I tried (also recommended in the article above):

    vcenter:~ # net-vdr

    bash: net-vdr: command not found

    Now I’m very stuck because I’m not sure if I configured a proper “Host Installation” (I remember that I had to forced the installation, "Force-sync" I think it was the option I selected) or if I have to check if anything else is wrong.

    I have doubts if I’m trying the commands in the CLI in the wrong prompt, I’ve noticed that the screenshots on the workaround only have the ~# at the beginning and mine says “vcenter”. Maybe it sounds stupid but I’ve tried every workaround I had my hands on and I’m doubting everything around this deployment and again I'm a rookie, so I think I can ask every stupid question I want (haha :smileylaugh:).

    Any clue or any idea you can direct my way, would be very much appreciate it.



  • 2.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 24, 2016 06:14 AM

    During Host Preparation, EAM (ESX Agent Manager) on the Vcenter is used for deploying the VIB Modules to the ESXi Hosts. So vxlan, vsip, netcpa, vsfwd are the kernel modules and user processes on the ESXi hosts. So, /etc/init.d/netcpad stop and start , as well as net-dvr commands should be on Esxi hosts since vCenter does not have netcpa on itself.


    http://chansblog.com/tag/uwa/

    UWA (netcpa and vsfwd):

    Force-sync may be helpful for synchronizing NSX Manager with NSX Edge, but it is used as troubleshooting tool,

    https://pubs.vmware.com/NSX-6/index.jsp#com.vmware.nsx.admin.doc/GUID-21FF2937-4CDF-491C-933E-8F44E21ED55E.html

    One quick check may be about the NSX Communication Health Check (It is available on NSX 6.2):

    http://www.virtually-limitless.com/nsx/nsx-6-2-communication-channel-health-check/

    General steps for NSX Installation may be:

    http://dailyhypervisor.com/vmware-nsx-for-vsphere-6-1-step-by-step-installation/

    If that order is followed, the ESXi netcpad - controller connection should be established:

    Is it possible to send output of this command?

    /etc/init.d/netcpad status

    For other commands this link may be helpful:

    http://www.vmwarearena.com/vmware-nsx-installation-part-7-verify-nsx-vibs-installation-from-esxi-hosts/

    If moudules are successfully installed and host preparation is ready, the ESG-DLR communication may be checked about Extend of Transport zones, arp or mac tables, but first the Control Plane should be working



  • 3.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 25, 2016 07:25 PM

    Hi cnrz‌ thank you for the time and effort.

    A couple of things regarding the troubleshooting.

    First, I was applying the commands in the wrong device (something like barking up the wrong tree) but like I said before I'm a rookie and I'm allowed to make stupid mistakes (haha :smileyhappy:) . Now following some of your directions and chanaka_ek‌ blog's instructions I was able to verify the netcpa daemon, some screenshots here:

    also

    & finally

    So everything here looks fine, it is consistent with the deployment I'm trying to accomplish.

    I took the task to graphically show you where exactly I'm having the communication issues.

    As you can see (from TOP to BOTTOM)

    The Edge GW can communicate to the VMs in the inside part as well as the 3 DLR's interfaces. And also with the outside world (Internet). So I guess no problem there.

    Then the DLR can ping the VMs but cannot communicate with the Edge Gw through the transport zone.

    And finally but not least, the most strange behavior is with the VMs; where they can Ping each other (through the DLR), can ping the DLR's interfaces and even the Edge Gw's interfaces (both), but cannot go throughout the Edge Gw to the outside world.

    Any ideas, suggestions ??

    Thanks in advanced.



  • 4.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 26, 2016 02:44 AM

    ‌The unsuccessful ping could be related about Firewall on the Edge Gateway or the Firewall on the DLR, is it possible to check  or disable them? (They are different than the dFW)

    BEsides Firewall, it could be a routing problem, or a NAT problem for both Northbound or southbound, is it possible to check witth traceroute from the VMs to the outside? Mosstly for DLR a default gateway is sufficient, and EDGe and outside FW may need static routes towards the VMs. again this traffic should be allowed on the dFW(if default permit is not available)



  • 5.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 26, 2016 09:40 PM

    For the VM's to communicate out of the ESG (Edge service gateway), you need to have the appropriate routes configured (unless you have dynamic routing configured) as well as the appropriate NAT rules (SNAT and DNAT) so that when you ping out to an external IP (say the default gateway), the response can come back to source. Are these also in place?

    Also as one of the others have point out, do you have any firewall rules / distributed firewall rules set up...etc?

    Also what version of ESXi and NSX are you looking at here?



  • 6.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 26, 2016 11:45 PM

    Hi chanaka_ek‌ & cnrz‌ , thank you both for the follow-up

    I'm sorry I didn't give this information earlier, but I've checked configurations regarding: routes, trace routes, firewalls, etc and to my eyes everthing looks fine. Although I have to say I haven't configured any NAT features anywhere, so maybe the solution is that way.

    But I'm gonna double check with you guys all the configurations mentioned before and hoping you maybe see somenthing I am not.

    EDGE GW:

    • Firewall: I didn't disable it, I just added an "any accept" traffic rule:
    • IP route table: I'm going with static routing:

    • Trace route from EDGE GW to VM:

    • Trace route & ping from EDGE GW to Perimeter Firewall:

    As you can see trace route fails and ping is succesful.

    DLR:


    • Firewall: "any any accept" rule.

    • IP route table: statics

    VM:

    • tracert from VM to EDGE GW (to the uplink interface of the EDGE gw):

    • tracert from VM to the perimeter firewall:

    ESXi version = 6.0.0

    NSX Version: 6.1.3 Build 2591148

    And again the only configuration I'm positive I haven't touched is the NAT. I'll take a look at it and report back.

    Regards.



  • 7.  RE: Communication Problems - EDGE gateway and DLR

    Posted Aug 27, 2016 04:54 AM

    ‌From the routing table of the DLR, it has no default route, but only a static route for 10.6.0.0. So DLR (so the VMs) can't reach beyond 10.6.0.0/24 outside. If the VMs can also ping 10.6.0.1 and they have and Perimeter FW has static routes for VM subnets to .230, as well NAT, then lack of DLR default route towards EDGE (10.1.100.1) may be the issue.



  • 8.  RE: Communication Problems - EDGE gateway and DLR
    Best Answer

    Posted Aug 30, 2016 02:42 PM

    Hello again guys,

    And again chanaka_ek‌ & cnrz‌ thank you so very much for your replies, both of you pointed me to the right direction.

    I finally got the Lab up and running thanks to some pointers of you.

    First was the missing default route for the DLR to the EDGE (thanks cnrz‌)

    And second was the missing EDGE's SourceNAT rule (thanks chanaka_ek‌)

    Combined, both missing configurations resolved all the communication issues.

    From this point on, it's explore time for me with NSX capabilities.

    Thanks again.