actyler1001
Enthusiast
Enthusiast

Controller deployment error - Unexpected inconsistency

Hello all, I've been beating my head against a wall trying to deploy NSX in my home lab and was wondering if anyone else has seen this?  I am deploying NSX in a nested 3 node cluster.  I have vCenter deployed and distributed virtual switches in place....  Able to vMotion and create new VMs without an issue.  I've got the NSX manager deployed and registered with vCenter.  I have attempted to deploy multiple versions from 2.7 up through the latest 3.2..

I cannot for the life of me get past the conroller deployment step.  I am confident this is not a network connectivity issue.  All dvswitches support MTU of 9000 and I have deployed multiple VMs to the same network and confirmed 1600+ MTU connectivity to all vCenter and NSX manager components.  The controller deployment times out and vCenter deletes the controller VM.  I have downloaded the logs from the NSX manager and there is nothing helpful.  Just shows the OVF successful deployment, successful power on, timeout, then delete event.

This is displayed on the controller boot screen..  All versions have this problem and I have completely rebuilt the lab while troubleshooting...  What the heck?

f.jpg

Regards,

Adam Tyler

adam@tylerlife.us

0 Kudos
17 Replies
bayupw
Leadership
Leadership

I wonder if it is a datastore issue.

Do you have any other different datastores to test?

Is the NSX Controller in the same portgroup and network as NSX Manager and vCenter?

Maybe the error is not related but double check network, IP addressing and NTP/time too

Bayu Wibowo | VCIX6-DCV/NV
Author of VMware NSX Cookbook http://bit.ly/NSXCookbook
https://github.com/bayupw/PowerNSX-Scripts
https://nz.linkedin.com/in/bayupw | twitter @bayupw
0 Kudos
actyler1001
Enthusiast
Enthusiast

Hi Bayu, I thought it was an issue with the datastore as well.  It's an iSCSI LUN presented by a home NAS4Free box.  The vCenter and NSX Manager VMs deploy to it just fine and run.  I even deleted the LUN and re-created.  I can create a Windows 7 VM, install the OS, and use it without a problem.  I don't think it is the storage array...  Unless NSX controllers have some specific requirement that other VMs don't when it comes to a datastore...

I have tried just about every combination of port group config you can think of..  I think the last test did use the same port group for vCenter, NSX Manager, and controller.

Regards,

Adam Tyler

0 Kudos
Sreec
VMware Employee
VMware Employee

I had the same issue in my nested lab after the main server reboot.Controller deployment always failed with same error, all i did was formatted the nested ESXI host and deployed the controller again.

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
actyler1001
Enthusiast
Enthusiast

Sreec,

Right there with you.  I just deleted all nested ESXi vmdks and completely rebuilt them from scratch.  Then deleted the iSCSI lun used for the nested environment and rebuilt..  It was formatted by the first host that connected..

Same issue.

So.......  now what......?

Regards,

Adam Tyler

0 Kudos
Sreec
VMware Employee
VMware Employee

Did you tried deploying new NSX manager after datastore rebuilt ?

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
amolnjadhav
Enthusiast
Enthusiast

Hi Adam,

     I have deployed NSX Lab environment multiple times but i have not come across such issue. Looks to be very interesting.

     I hope you have not cloned NSX Manger VM? Please make sure DNS resolution is working fine.

     Try to deploy controller on another DS ("may be local DS").

Regards

Amol

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Regards Amol Jadhav VCP NSXT | VCP NSXV | VCIX6-NV | VCAP-DCA | CCNA | CCNP - BSCI
0 Kudos
actyler1001
Enthusiast
Enthusiast

Sreec,

Yes, this is the process I just followed that ended up having the same result.

1. Delete datastore using NAS4Free interface.  (Deleted iSCSI extent and related target)

2. Delete 2 of the 3 nested ESXi host VMs.

3. Delete hard disk (VMDK) from remaining nested ESXi host VM.

** At this stage vCenter and NSX Manager have been completely deleted/purged.

4. Add VMDK back to nested ESXi host.  Install ESXi.

5. Manually create ESXi host VM 2 and 3.  Not clone.  Manual.

6. Configure new iSCSI LUN and target.

7. Install ESXi on 2 remaining hosts and configure networking.

8. Attach new iSCSI LUN to first nested host and format with latest file system.

9. connect to remaining hosts.

10. Deploy vCenter.

11. Create 3 node cluster.

12. Deploy NSX from scratch.

13. Register with vCenter.

14. Face SAME PROBLEM.....!

0 Kudos
actyler1001
Enthusiast
Enthusiast

amolnjadhav,

DNS is in place and all related "A" records created.  vCenter, hosts, and NSX manager can resolve the FQDN of each ESXi host as well as the vCenter server and NSX Manager server.  Every NSX deployment has been from an OVF template downloaded from VMware directly.  I have never cloned the NSX manager.

The Nested ESXi hosts don't really have a local datastore.  They are part of a physical 2 node ESXi cluster stored on a different LUN of the same storage device.  I created each ESXi nested VM with a 32 Gb thin VMDK and DID try deploying the NSX manager to one of these datastores with the same error.  However any other VM can be deployed to any datastore and run without error.  Including the NSX Manager and vCenter server.  The issue only seems to affect controllers.

Regards,

Adam Tyler

0 Kudos
Sreec
VMware Employee
VMware Employee

You have done the same step what i did without any positive result . I remember i even tested NSX edge deployment via same Manager instance and it booted perfectly fine and that's how i confirmed issue is specific to Controllers. What version of NSX are you running now ? I tried NSX 6.3.1 after that event and everything is working smoothly , would you mind trying a different nsx manager version ?

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
amolnjadhav
Enthusiast
Enthusiast

Hi Adam,

   You have confirmed that its not datastore issue because other VMs are running fine except controller VM,

   Let me search on this issue,  Have you successfully deployed controllers by using same NSX Manager OVA in the past?

   I have experienced issue like ESX Hosts unable to communicate with Controllers.. i tried searching but didn't get any answer, finally i have changed NSX Manager OVA to resolve the issue.

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Regards Amol Jadhav VCP NSXT | VCP NSXV | VCIX6-NV | VCAP-DCA | CCNA | CCNP - BSCI
0 Kudos
actyler1001
Enthusiast
Enthusiast

Sreec,

Thanks for your reply.  I have attempted to deploy NSX 6.2.7, 6.2.8, 6.3.1, and 6.3.2.  Same result each time.

Regards,

Adam Tyler

0 Kudos
actyler1001
Enthusiast
Enthusiast

amolnjadhav,

This is actually my first time deploying NSX, so no I have never successfully been through a controller deployment.  I have only ever attempted in this home lab.  My VCP 6 datacenter cert is about to expire and I was debating on trying to renew by going after the NSX cert.  Turned out to be a none starter unfortunately.

I feel like I have confirmed the port groups I am using shouldn't prevent the controllers from communicating with the NSX manager and vCenter.  I deployed a couple of test VMs running other operating systems and they are able to connect to all destinations using the same port groups.  Shouldn't be MTU related either, I am able to force a ping using an MTU over 8000 successfully.  In addition the controllers never appear to even successfully boot with this disk error.  Really weird.

Regards,

Adam Tyler

0 Kudos
Sreec
VMware Employee
VMware Employee

Sorry to hear that. Just to confirm ,these deployment test(Other NSX versions) was done after formatting nested esxi and VMFS datastore ?  - It took me two weeks to fix the same issue,like i said i ended up formatting the nested host and VMFS volumes and deployed a new Host&VMFS followed by new NSX manager deployment. I'm very sure this error has nothing to do network(DNS/GW/NTP etc)

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
Sreec
VMware Employee
VMware Employee

Any luck Adam ? You should certainly try NSX 6.3.3 . Controller OS is photon in this release. Just ensure you are running supported vSphere platform while deploying NSX manager and let me know how it goes after Controller deployment.

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos
actyler1001
Enthusiast
Enthusiast

Sreec,

I've only formatted everything and rebuilt once.  The other attempts were just a process of deleting the NSX Manager and manually removing the registration from vCenter.  Then redeploying a different version of the NSX Manager and re-registering.

Regards,

Adam Tyler

0 Kudos
actyler1001
Enthusiast
Enthusiast

Sreec,

After that last failed attempt and a day spent (interruptions and slow lab hardware) on rebuilding the nested VMware lab I haven't been real motivated to try again.  I didn't know about version 6.3.3. It should work with vCenter and vSphere 6 U3?

Regards,

Adam Tyler

0 Kudos
Sreec
VMware Employee
VMware Employee

I can understand your frustration. But latest version is worth trying  . Yes it supported with U3

VMware Product Interoperability Matrices

Cheers,
Sree | VCIX-5X| VCAP-4X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
0 Kudos