Hello All,
I am trying to deploy workload domain via vCF in a nested environment in my lab. The workload domain creation fails at NSX manager deployment stage.
I checked doainmanager.log and seems SDDC manager can't reach out to NSX manager post deployment.
As per logs, SDDC manager is firing an api call to nsx manager to fetch the component info and it's failing as nsx is not getting initialized 100% by that time. This is what I see in logs
2019-06-23T04:33:55.607+0000 INFO [816ffc22a8b1693b,5bf5] [o.a.http.impl.execchain.RetryExec,pool-2-thread-6] Retrying request to {s}->https://172.16.31.199:443
2019-06-23T04:33:58.607+0000 WARN [816ffc22a8b1693b,5bf5] [c.v.e.s.c.s.nsx.impl.NsxServiceImpl,pool-2-thread-6] NSX Manager likely down:
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://172.16.31.199/api/1.0/appliance-management/components/component/NSX": No route to host (Host unreachable); nested exception is java.net.NoRouteToHostException: No route to host (Host unreachable)
2019-06-19 17:42:27.478 [vcf_dm,2d09c0eb966f33a3,44db31a238298c9c] [-thread-13] ERROR [ c.v.e.s.common.services.nsx.impl.NsxServiceImpl] NSX Manager did not come alive within 600 seconds
2019-06-19 17:42:27.479 [vcf_dm,2d09c0eb966f33a3,44db31a238298c9c] [-thread-13] ERROR [ c.v.v.c.f.p.action.impl.DeployNsxManagerAction] NSX Manager did not respond within 600 seconds after deploy
2019-06-19 17:42:27.493 [vcf_dm,2d09c0eb966f33a3,44db31a238298c9c] [-thread-13] ERROR [c.v.e.sddc.orchestrator.model.error.ErrorFactory] [EVDD96] DEPLOY_NSX_MANAGER_FAILED Failed to deploy NSX Manager
com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Failed to deploy NSX Manager
I observed that NSX manager is taking a bit long to boot. Once booting is completed I can ping NSX manager from my sddc manager using hostname. But by that time SDDC manager is giving up and deletes the old instance of nsx and deploys a fresh vm. I am using NSX-V for my workload domain.
Have anyone seen this issue before?
Update on this issue.
I performed rolling reboot of all the 4 hosts in my management domain and retried task and it completed without any further issues,