Hi everyone,
We recently had to recover the VCenter Appliance from a vmdk, as well as the Platform Services Controller, but whenever I try to stop and start the VPXD service, it comes up with the following error output. Any ideas?
INFO:root:Service: vmware-vpxd, Action: start
Service: vmware-vpxd, Action: start
2018-08-07T18:30:21.431Z Running command: ['/sbin/chkconfig', u'vmware-vpxd']
2018-08-07T18:30:21.507Z Done running command
2018-08-07T18:30:21.507Z Running command: ['/sbin/service', u'vmware-vpxd', 'status']
2018-08-07T18:30:22.126Z Done running command
2018-08-07T18:30:22.126Z Running command: ['/sbin/chkconfig', '--force', u'vmware-vpxd', 'on']
2018-08-07T18:30:22.187Z Done running command
2018-08-07T18:30:22.187Z Running command: ['/sbin/service', u'vmware-vpxd', 'start']
2018-08-07T18:40:35.855Z Done running command
2018-08-07T18:40:35.855Z Invoked command: ['/sbin/service', u'vmware-vpxd', 'start']
2018-08-07T18:40:35.855Z RC = 1
Stdout = vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd
Waiting for the embedded database to start up: success
Executing pre-startup scripts...
vmware-vpxd: Starting vpxd by administrative request.
success
vmware-vpxd: Waiting for vpxd to start listening for requests on 8089
Waiting for vpxd to initialize: ..........................................................Tue Aug 7 18:40:13 UTC 2018 Captured live core: /var/core/live_core.vpxd.16169.08-07-2018-18-40-13
[INFO] writing vpxd process dump retry:2 Time(Y-M-D H:M:S):2018-08-07 18:40:11
.Tue Aug 7 18:40:25 UTC 2018 Captured live core: /var/core/live_core.vpxd.16169.08-07-2018-18-40-25
[INFO] writing vpxd process dump retry:1 Time(Y-M-D H:M:S):2018-08-07 18:40:23
.failed
failed
vmware-vpxd: vpxd failed to initialize in time.
vpxd is already starting up. Aborting the request.
Stderr =
2018-08-07T18:40:35.856Z {
"resolution": null,
"detail": [
{
"args": [
"Command: ['/sbin/service', u'vmware-vpxd', 'start']\nStderr: "
],
"id": "install.ciscommon.command.errinvoke",
"localized": "An error occurred while invoking external command : 'Command: ['/sbin/service', u'vmware-vpxd', 'start']\nStderr: '",
"translatable": "An error occurred while invoking external command : '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
ERROR:root:Unable to start service vmware-vpxd, Exception: {
"resolution": null,
"detail": [
{
"args": [
"vmware-vpxd"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vmware-vpxd'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
}
Unable to start service vmware-vpxd, Exception: {
"resolution": null,
"detail": [
{
"args": [
"vmware-vpxd"
],
"id": "install.ciscommon.service.failstart",
"localized": "An error occurred while starting service 'vmware-vpxd'",
"translatable": "An error occurred while starting service '%(0)s'"
}
],
"componentKey": null,
"problemId": null
It's running 6.0
Alright it's back up! Success! After taking a closer look in the VCSA's /var/log/vmware/vpxd/vpxd.log line by line after the SSL certs are exchanged between VCSA and the PSC, the VCSA was complaining that it could not find the SSO Admin server (/sso-adminserver/sdk/vcenter.local) when connecting to the PSC with 404 not found error as seen below:
2018-08-08T17:29:41.070Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=dbPortgroup] [VpxdInvtDVPortGroup::PreLoadDvpgConfig] loaded [9] dvpg config objects
2018-08-08T17:29:41.074Z warning vpxd[7F5E84185700] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f5e80e73f10, h:25, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:18090'>>, e: system:111(Connection refused)
2018-08-08T17:29:41.075Z error vpxd[7F5E84185700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007f5e80f162e0, TCP:localhost:18090>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection refused)
2018-08-08T17:29:41.075Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=LSClient] Caught exception while connecting to LS: N7Vmacore15SystemExceptionE(Connection refused)
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Solution user set to: vpxd-ec2ad075-6aed-89cb-frd5-95b89dfe0140@vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC's ServiceId in LookupService:
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC SSL certificate location: /etc/vmware-vpx/ssl/rui.crt
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:29:41.086Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:29:41.087Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
2018-08-08T17:29:41.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.
2018-08-08T17:29:51.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
.
.
.## A bunch of retries with the exact same output ###
.
.
2018-08-08T17:31:01.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.
2018-08-08T17:31:11.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:31:11.198Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Max attempts (10) reached. Giving up ...
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Unable to create SSO facade: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found).
2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] Init [Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)] took 90122 ms
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Main] [Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)
--> Backtrace:
-->
--> [backtrace begin] product: VMware VirtualCenter, version: 6.0.0, build: build-7462485, tag: vpxd
--> backtrace[00] libvmacore.so[0x003C5FC4]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001F0743]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x0019A69D]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x00BD0D8E]: Vmomi::Fault::SystemError::Exception::Exception(std::string const&)
--> backtrace[04] vpxd[0x00BCE80A]
--> backtrace[05] vpxd[0x00BBAAD0]
--> backtrace[06] vpxd[0x00AF8E99]
--> backtrace[07] libc.so.6[0x0001EC36]
--> backtrace[08] vpxd[0x00AF88FD]
--> [backtrace end]
-->
2018-08-08T17:31:11.203Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] ServerApp::Init [TotalTime] took 94019 ms
2018-08-08T17:31:11.204Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Failed to intialize VMware VirtualCenter. Shutting down...
2018-08-08T17:31:11.204Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=SupportMgr] Wrote uptime information
2018-08-08T17:33:11.205Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Forcing shutdown of VMware VirtualCenter now
So I looked up this error string "Vmomi::Fault::SystemError while trying to connect to SSO Admin server: InvalidResponseExceptionE(Invalid response code: 404 Not Found)" and found a VMWare KB2061412 (https://kb.vmware.com/s/article/2061412) that requests to restart the VMWare Secure Token Service by issusing the following commands (in my case I had to do it in the Platform Controller instead of withing the VCSA)
/etc/init.d/vmware-stsd restart
/etc/init.d/vmware-sts-idmd restart
The article asks you to restart just the vxpd service in the VCSA shell by running /etc/init.d/vmware-vpxd restart, but that only gave me back the VCenter homepage and when trying to load the VSphere Web Client page it would display another error, so I decided to restart all services within VCSA shell by running:
service-control --stop --all
service-control --start --all
And after that I was able to sign out loud the chorus from GF Handel's Messiah "Haaaaaallelujah! Haaaaallelujah!" and was able to login and start managing my VMs.
Note: I also followed this VMWare KB2065630 (https://kb.vmware.com/s/article/2065630) where I added the entry <ThreadStackSizeKb>1024</ThreadStackSizeKb> to the vpxd.cfg file right before doing all the steps mentioned above, so not sure if this played a part.
To recap. I had to update DNS, NTP, add the CA I found in the SSO folder in PSC to the TrustedCerts.pem file and enable it under confix.xml (article mentioned before), then adding the <ThreadStackSizeKb>1024</ThreadStackSizeKb> I just mentioned, followed by restarting the VMWare Secure Token service in the PSC and restarting all services in VCSA.
I had taken a look before at the output of /var/log/vmware/vpxd/vpxd.log and noticed that the SSO connection to the PSC was complaining, but concentrated on the SSL certs since they were not being trusted. I honestly think that it would have probably worked if I had just restarted the Secure Token Service and then all services in the VCSA, but I'll never know unless I restore the whole thing again and try to solve this puzzle.
It's working now, so thanks everyone for your guidance with this adventure! Au revoir!
You may want to restart PSC first and don't restart vCenter appliance unless you verify PSC appliance has come online and all services are started. Then restart vCenter appliance and retry the task you are trying to do.
Hi
adding to what Vijay said you can check below similar thread which is going on.
I see a similar thread going on in community :
vcenter service 6.5 stopped and is not starting
-->Make sure your NTP , DNS , and storage space is causing any issues here .
Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
regards
Gayathri
Thanks for responding. One thing I forgot to mention is that I had noticed that PSC would refuse connection on port 443 according to the VPXD log file and then when trying to start the services automatically using "service-control --start --all", it would end up saying that each service was set to manual and skipping each one, so I had to start one by one manually.
Everytime I restart PSC it comes up immediately as if all services get skipped. I'll restart VCenter one more time since I'm not sure where to set PSC to auto for all services using the shell.
vCenter services are depended on PSC due to SSO architecture. Since its skipping services it means problem reside on PSC itself. Please make sure you make up below services on PSC without any error.
VMware Appliance Management Service
VMware License Service
VMware Component Manager
VMware Identity Management Service
VMware HTTP Reverse Proxy
VMware Service Control Agent
VMware Security Token Service
VMware Common Logging Service
VMware Syslog Health Service
VMware Authentication Framework
VMware Certificate Service
VMware Directory Service
Then login to VAMI console of PSC and check health status
GayathriS:
I took your recommendation and did find that the date and time were different between servers, so I updated the ntp.conf file and restarted ntp and now they show in sync. I'll try again in a bit
Vijay:
The services listed when doing a "service-controll --start --all" were the following which I manually started them without errors the last time I restarted PSC. Are you able to tell if it matches the long names in your list for PSC services? I'm trying to find a list that I can use to co-relate to make sure.
vmafdd
vmware-rhttpproxy
wmdird
vmcad
vmware-stsd
vmware-cm
vmware-cis-license
vmware-psc-client
vmware-sca
appltmgmt
vmware-syslog
vmware-syslog-health
Here you have the each service description and details : Platform Services Controller Services
But as you identified time was not in sync, please make sure the NTP source and sync configuration. That is the main requirement of communication between PSC and vCenter.
VCenter is still restarting. It's taking an unusually long time. I also notice that the certificates whenever I browse to the name or IP don't show the entire certificate chain and only the device certificate. The CA is not visible in the chain.
OK I think I'm a little closer now. in the logs I see that the VCenter appliance now is able to see the PSC, but because the CA is missing from the chain, there are entries that say they don't trust each other. I just need to find how to import the CA using the shell. I found this article but I'm not sure it's the right one: https://docs.vmware.com/en/VMware-vSphere/6.5/vsphere-esxi-vcenter-server-65-platform-services-contr...
2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created ComponentManagerGatewaySource!
2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created CmConnectionFSM
2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created ComponentManagerClient.
2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] CmConnectionFSM::RunFSM(ST_INIT)
2018-08-07T21:10:42.894Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-07T21:10:42.897Z error vpxd[7FEECEF8C700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007feee41fef60, TCP:mlxpsc1.corp.com:443>; cnx: (
null), error: N7Vmacore3Ssl18SSLVerifyExceptionE(SSL Exception: Verification parameters:
--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4
--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89
--> ExpectedPeerName: mlxpsc1.corp.com
--> The remote host certificate has these problems:
-->
--> * unable to get local issuer certificate)
2018-08-07T21:10:42.898Z error vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] [CisConnection]: Error getting trusted STS certificates: SSL Exception: Verification parameters:
--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4
--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89
--> ExpectedPeerName: mlxpsc1.corp.com
--> The remote host certificate has these problems:
-->
--> * unable to get local issuer certificate
2018-08-07T21:10:42.898Z warning vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] State(ST_INIT) failed with: SSL Exception: Verification parameters:
--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4
--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89
--> ExpectedPeerName: mlxpsc1.corp.com
--> The remote host certificate has these problems:
-->
--> * unable to get local issuer certificate
2018-08-07T21:10:42.898Z warning vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] ComponentManager service is not available! Will attempt a lazy init of CmClient on first use!
How many PSCs are in this environment?
Given it's reporting a thumbprint mis-match from the given and expected thumbprints when trying to connect to the SSO VMOMI endpoint, I suspect that you have an issue with your SSL trust anchors.
When did you replace the PSC Machine SSL certificate, and by what method did you replace it (using the certificate-manager tool, using vecs-cli, etc)?
Did you replace your vCenter certificate from a custom one or are you using VMCA certificates?
Thanks for replying. Update so far: I followed this article Dude! Where's my vCSA SSL Cert chain? - vRyan.co.uk - Virtualization Blog from someone experiencing this certificate issue, and like theVElement mentions, the trustpoints.pem file in the rhttpproxy SSL container was empty and config.xml had it commented on both the VCSA and the PSC appliances, which I don't know how it was working before the restore since it's in the rhttpproxy config from the backup I had.
However the problem still remains even after a reboot of both devices. Now this whole time I think the problem has been that the VPXD service in the VCSA never listens on port TCP 8089 as seen in my manual restart attempt of vmware-vpxd. Looking at logs from months before the restore, the messsage "vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd" has always been there and the VPXD used to start and listen on 8089 after a few seconds.
if I open the shell of VCSA and type "iptables -L port_filter -n --line-numbers" the port 8089 is never listed as listening, so I don't think it's a firewall issue here. The output is not very helpful as to what is going on.
mgmlxvcs1:~ # /etc/init.d/vmware-vpxd restart
vmware-vpxd: already stopped
vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd
Waiting for the embedded database to start up: success
Executing pre-startup scripts...
vmware-vpxd: Starting vpxd by administrative request.
success
vmware-vpxd: Waiting for vpxd to start listening for requests on 8089
Waiting for vpxd to initialize: ..........................................................Wed Aug 8 00:26:27 UTC 2018 Captured live core: /var/core/live_core.vpxd.6497.08-08-2018-00-26-27
[INFO] writing vpxd process dump retry:2 Time(Y-M-D H:M:S):2018-08-08 00:26:25
.Wed Aug 8 00:26:39 UTC 2018 Captured live core: /var/core/live_core.vpxd.6497.08-08-2018-00-26-39
[INFO] writing vpxd process dump retry:1 Time(Y-M-D H:M:S):2018-08-08 00:26:37
.failed
failed
vmware-vpxd: vpxd failed to initialize in time.
vpxd is already starting up. Aborting the request.
Is this vSphere 6.0 or 6.5?
You'll want to look in /var/log/vmware/vpxd/vpxd.log to see if there's any indication why vCenter is taking so long to start.
It's running 6.0
Alright it's back up! Success! After taking a closer look in the VCSA's /var/log/vmware/vpxd/vpxd.log line by line after the SSL certs are exchanged between VCSA and the PSC, the VCSA was complaining that it could not find the SSO Admin server (/sso-adminserver/sdk/vcenter.local) when connecting to the PSC with 404 not found error as seen below:
2018-08-08T17:29:41.070Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=dbPortgroup] [VpxdInvtDVPortGroup::PreLoadDvpgConfig] loaded [9] dvpg config objects
2018-08-08T17:29:41.074Z warning vpxd[7F5E84185700] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f5e80e73f10, h:25, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:18090'>>, e: system:111(Connection refused)
2018-08-08T17:29:41.075Z error vpxd[7F5E84185700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007f5e80f162e0, TCP:localhost:18090>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection refused)
2018-08-08T17:29:41.075Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=LSClient] Caught exception while connecting to LS: N7Vmacore15SystemExceptionE(Connection refused)
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Solution user set to: vpxd-ec2ad075-6aed-89cb-frd5-95b89dfe0140@vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC's ServiceId in LookupService:
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC SSL certificate location: /etc/vmware-vpx/ssl/rui.crt
2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local
2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:29:41.086Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:29:41.087Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
2018-08-08T17:29:41.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.
2018-08-08T17:29:51.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
.
.
.## A bunch of retries with the exact same output ###
.
.
2018-08-08T17:31:01.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.
2018-08-08T17:31:11.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.
2018-08-08T17:31:11.198Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Max attempts (10) reached. Giving up ...
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Unable to create SSO facade: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found).
2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] Init [Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)] took 90122 ms
2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Main] [Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)
--> Backtrace:
-->
--> [backtrace begin] product: VMware VirtualCenter, version: 6.0.0, build: build-7462485, tag: vpxd
--> backtrace[00] libvmacore.so[0x003C5FC4]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001F0743]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x0019A69D]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x00BD0D8E]: Vmomi::Fault::SystemError::Exception::Exception(std::string const&)
--> backtrace[04] vpxd[0x00BCE80A]
--> backtrace[05] vpxd[0x00BBAAD0]
--> backtrace[06] vpxd[0x00AF8E99]
--> backtrace[07] libc.so.6[0x0001EC36]
--> backtrace[08] vpxd[0x00AF88FD]
--> [backtrace end]
-->
2018-08-08T17:31:11.203Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] ServerApp::Init [TotalTime] took 94019 ms
2018-08-08T17:31:11.204Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Failed to intialize VMware VirtualCenter. Shutting down...
2018-08-08T17:31:11.204Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=SupportMgr] Wrote uptime information
2018-08-08T17:33:11.205Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Forcing shutdown of VMware VirtualCenter now
So I looked up this error string "Vmomi::Fault::SystemError while trying to connect to SSO Admin server: InvalidResponseExceptionE(Invalid response code: 404 Not Found)" and found a VMWare KB2061412 (https://kb.vmware.com/s/article/2061412) that requests to restart the VMWare Secure Token Service by issusing the following commands (in my case I had to do it in the Platform Controller instead of withing the VCSA)
/etc/init.d/vmware-stsd restart
/etc/init.d/vmware-sts-idmd restart
The article asks you to restart just the vxpd service in the VCSA shell by running /etc/init.d/vmware-vpxd restart, but that only gave me back the VCenter homepage and when trying to load the VSphere Web Client page it would display another error, so I decided to restart all services within VCSA shell by running:
service-control --stop --all
service-control --start --all
And after that I was able to sign out loud the chorus from GF Handel's Messiah "Haaaaaallelujah! Haaaaallelujah!" and was able to login and start managing my VMs.
Note: I also followed this VMWare KB2065630 (https://kb.vmware.com/s/article/2065630) where I added the entry <ThreadStackSizeKb>1024</ThreadStackSizeKb> to the vpxd.cfg file right before doing all the steps mentioned above, so not sure if this played a part.
To recap. I had to update DNS, NTP, add the CA I found in the SSO folder in PSC to the TrustedCerts.pem file and enable it under confix.xml (article mentioned before), then adding the <ThreadStackSizeKb>1024</ThreadStackSizeKb> I just mentioned, followed by restarting the VMWare Secure Token service in the PSC and restarting all services in VCSA.
I had taken a look before at the output of /var/log/vmware/vpxd/vpxd.log and noticed that the SSO connection to the PSC was complaining, but concentrated on the SSL certs since they were not being trusted. I honestly think that it would have probably worked if I had just restarted the Secure Token Service and then all services in the VCSA, but I'll never know unless I restore the whole thing again and try to solve this puzzle.
It's working now, so thanks everyone for your guidance with this adventure! Au revoir!
Oh yeah, forgot that I also started all Platform controller VMWare services manually using:
service-control --start vmafdd
service-control --start vmware-rhttpproxy
service-control --start vmdird
service-control --start vmcad
service-control --start vmware-stsd
service-control --start vmware-cm
service-control --start vmware-cis-license
service-control --start vmware-psc-client
service-control --start vmware-sca
service-control --start applmgmt
service-control --start vmware-syslog
service-control --start vmware-syslog-health
and I just noticed that I'm missing in this list the second service mentioned in the article KB2061412 which is vmware-sts-idmd and it would have probably worked a long time ago. Oh well!
************************************
*** This is the actual Answer ***
************************************
OK I actually got to the root of the problem by calling VMWare support since my search for PSC services not starting on auto was not yielding any results on my own. After explaining the issue and trying to prove that the problem is that the PSC services are currently set to Manual and that I just needed the right commands on how to set them to Auto from the shell, they asked me to follow KB2151528 (https://kb.vmware.com/s/article/2151528#/s/article/2151528) which says the following steps
1. Run this command to check the start status : chkconfig -A
For example:
#chkconfig -A
You see output similar to:
applmgmt off
vmafdd off
vmcad off
vmci on
vmdird off
vmware-cis-license off
vmware-cm off
vmware-psc-client off
vmware-rhttpproxy off
vmware-sca off
vmware-sts-idmd off
vmware-stsd off
vmware-syslog off
vmware-syslog-health off
vmware-tools-services on
vmware-tools-vgauth on
vsock on
In the above example, most of the VMware service status is OFF.
2. If the service status is set to off, then perform these steps to turn it ON.
a. Run this command to stop all the running services:
service-control --stop --all
b. Run this command to change the service state from Off to On state:
chkconfig <servicename> on
For example:
chkconfig vmware-cm on
c. Once all the service status is set to ON, run this command to start all the services:
service-control --start --all
Note: Change the status of the dependent service to ON before changing the status of any service that you are trying to start. Attempting to change the status without changing the status of the dependent service will fail.
For example:
#chkconfig vmware-cis-license on
insserv: FATAL: service vmware-cm has to be enabled to use service vmware-cis-license
insserv: Service syslog is missed in the runlevels 2 to use service cgrulesengd
insserv: exiting now!
/sbin/insserv failed, exit code 1
So I used my Excell chops and created this list and pasted it a few times in the shell of the PSC until I got no errors
chkconfig applmgmt on
chkconfig vmafdd on
chkconfig vmcad on
chkconfig vmdird on
chkconfig vmware-cis-license on
chkconfig vmware-cm on
chkconfig vmware-psc-client on
chkconfig vmware-rhttpproxy on
chkconfig vmware-sca on
chkconfig vmware-sts-idmd on
chkconfig vmware-stsd on
chkconfig vmware-syslog on
chkconfig vmware-syslog-health on
And Voila! After a restart of both PSC and VCSA all is well again!