can't put VM on NSX segment

billdossett · ‎02-15-2021

so, on to my latest problem - seems like I have hit every rut in the road... I have created an edge node, edge cluster, T1 gateway and several segments... however, when I try to connect a VM either a new VM or an existing VM to the segment, I get an error occurred during host configuration... not very helpful. The segment appears to be healthy, admin up and appearing on my only DVS and all hosts connected. Everything looks sweet... however if I login to a host and open the nsxcli and do get logical-switches I only get

terrapin-esxi01.terrapin.local> get logical-switches
                  Logical Switches Summary
------------------------------------------------------------

                    Overlay Kernel Entry
============================================================
  VNI                    DVS name                 VIF num

                     Overlay LCP Entry
============================================================
  VNI              Logical Switch UUID              Name

                     VLAN Backed Entry
============================================================
          Logical Switch UUID            VLAN ID

shouldn't that be listing something... I was reading some troubleshooting article and though it didn't show the output, it seemed to be saying I should see something there, but maybe I'm not doing the right thing there. WHen I try to create a new VM attached to the segment, I get this in the events:

 02/15/2021, 12:42:51 PM Failed to create virtual machine test on terrapin-esxi02.terrapin.local
Related events:
 02/15/2021, 12:42:51 PM	Removed test on terrapin-esxi02.terrapin.local from vSAN Datacenter
 02/15/2021, 12:42:51 PM	New MAC address (00:50:56:b3:68:90) assigned to adapter 50 33 57 bc bb ad 26 dc-47 c7 6b 11 9c 9c b4 66 for test
 02/15/2021, 12:42:51 PM	Assigned new BIOS UUID (42331430-5618-f69e-36c2-b434ed1cd283) to test on terrapin-esxi02.terrapin.local in vSAN Datacenter
 02/15/2021, 12:42:51 PM	Assign a new instance UUID (5033628d-1deb-e6c0-a81b-d0bf5a811409) to test
 02/15/2021, 12:42:51 PM	Creating test on terrapin-esxi02.terrapin.local, in vSAN Datacenter
 02/15/2021, 12:42:50 PM	Task: Create virtual machin

seems like it is failing immediately when it tries to put the new mac address on. I have tried with existing VMs and creating new VMs like this... windows and Linux. I've recreated the edge node, edge cluster, t1 gateway and segment twice now to be sure I didn't miss anything.

Bill Dossett

shank89 · ‎02-15-2021

The segments will only show up in nsxcli once they are attached with powered on workload. After it fails, does the segment show as having a port attached?

Do you perhaps have storage issues? If you are not able to create any VMs? Is vSAN healthy?

Have the hosts definitely got TEP addresses assigned? Have you tested their TEP connectivity for E/W and to the edge?

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

billdossett · ‎02-15-2021

ok, understand the segments in nsxcli not showing without workloads.... no it does not show as having a port attached in nsx-manager... still says 0 ports.

no storage issues, I have 13 TB of healthy all flash vSAN abailable.

all hosts and edge have TEP addresses and vmkping over the TEP vmks i/fs E/W

Like I say, it really looks like it should be working to me, especially after you confirmed that nsxcli won't show anything until a workload is connected.

Bill Dossett

billdossett · ‎02-16-2021

I found a bit more info in vpxd log...

2021-02-16T19:21:46.675Z info vpxd[33317] [Originator@6876 sub=vpxLro opID=kkybvrtl-67302-auto-1fxj-h5:70023659-0] [VpxLRO] -- FINISH task-3425
2021-02-16T19:21:46.675Z info vpxd[33317] [Originator@6876 sub=Default opID=kkybvrtl-67302-auto-1fxj-h5:70023659-0] [VpxLRO] -- ERROR task-3425 -- vm-57 -- vim.VirtualMachine.reconfigure: vim.fault.PlatformConfigFault:
--> Result:
--> (vim.fault.PlatformConfigFault) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = ,
-->    text = "Failed to attach vif uuid to network nsx.LogicalSwitch:5f4a660e-9b2f-4054-89a7-3fce55a47d29,Failed to send VIF RPC request"
-->    msg = "An error occurred during host configuration."
--> }

failed to send vif rpc request... not a lot more help to me... but does it mean anything to anyone else?

Bill Dossett

shank89 · ‎02-16-2021

What happens if you try to manually attach a VM to a logical port? Although you may still have the issue when trying to map the VM to the PG in vCenter. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.3/com.vmware.nsxt.admin.doc/GUID-5AA0302A-7F9C...

Is this occurring on all hosts in the cluster?

Have you tried rebooting the hosts, this is more of an ESXi issue than an NSX-T issue.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

billdossett · ‎02-16-2021

yes, it occurs on all hosts in the cluster... just performed a reboot of all of them and still the same.

I tried to follow the doc on manually adding a port, however it says to se the attachment type to vif and then it wants the attachment ID and does not tell you what the attachment ID is 😞 The only thing I could find on VIF ID was that once you attach the VM to a logical switch you can see the VIF ID, so that's counterintuitive seeing as I can't attach the VM to a logical switch... and while I am working on that, the nsx-manager ran out of memory and crashed again and now it doesn't seem to restart the service any more. It was sort of working before and would at least recover - but since I rebooted, it doesn't seem to do that anymore.

I've started from scratch rebuilding this cluster now 3 times, imaging the hosts and all to make sure it is clean and I wind up in this same situation every time. I wonder what I might be doing wrong.

Bill Dossett

shank89 · ‎02-16-2021

Putting it as plainly as I can here, there is something fundamentally wrong in either the configuration or hardware.

I've never seen / heard of such issues in a clean build. If this wasn't for your lab I would have told you to get support involved.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3

CyberNils · ‎02-16-2021

Could this be due to a firewall between the ESXi Transport Nodes and the NSX Managers?

Try these in ESXi:

[root@bgo-lab-esx-01:~] nsxcli -c get managers
Wed Feb 17 2021 UTC 07:37:34.496
- 10.39.128.63     Connected (NSX-RPC)
- 10.39.128.65     Connected (NSX-RPC)
- 10.39.128.64     Connected (NSX-RPC) *

[root@bgo-lab-esx-01:~] nsxcli -c get controllers
Wed Feb 17 2021 UTC 07:38:06.787
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN
  10.39.128.64    1235   enabled     connected             true               up        bgo-mgmt-nsxmgr-02.nolab.local
    0.0.0.0       1235   enabled      not used            false              null       bgo-mgmt-nsxmgr-03.nolab.local
  10.39.128.63    1235   enabled      not used            false              null       bgo-mgmt-nsxmgr-01.nolab.local

Nils Kristiansen
https://cybernils.net/

billdossett · ‎02-17-2021

no, no firewall between Transport NOdes and NSX Managers, just a huge amount of stupidity between left ear and right ear.

DNS was not configured on my ESXi hosts. I've been imaging my hosts for some time now and somehow I thought DNS was getting configured thru the imaging process - it is not.

How I was able to buile a full vSAN cluster without any warnings about DNS not working - well. You can.

I finally noticed that ssl://nsx-manager.terrapin.local was failing to resolve inn the nsx-syslog.log and just tried a simple ping and then realized my resolv.conf was empty.

several days of my life I won't get back - sorry for all the trouble.

Bill Dossett

All

can't put VM on NSX segment