jreal
Contributor
Contributor

VCenter Appliance VPXD service does not Start with Error

Jump to solution

Hi everyone,

We recently had to recover the VCenter Appliance from a vmdk, as well as the Platform Services Controller, but whenever I try to stop and start the VPXD service, it comes up with the following error output. Any ideas?

INFO:root:Service: vmware-vpxd, Action: start

Service: vmware-vpxd, Action: start

2018-08-07T18:30:21.431Z   Running command: ['/sbin/chkconfig', u'vmware-vpxd']

2018-08-07T18:30:21.507Z   Done running command

2018-08-07T18:30:21.507Z   Running command: ['/sbin/service', u'vmware-vpxd', 'status']

2018-08-07T18:30:22.126Z   Done running command

2018-08-07T18:30:22.126Z   Running command: ['/sbin/chkconfig', '--force', u'vmware-vpxd', 'on']

2018-08-07T18:30:22.187Z   Done running command

2018-08-07T18:30:22.187Z   Running command: ['/sbin/service', u'vmware-vpxd', 'start']

2018-08-07T18:40:35.855Z   Done running command

2018-08-07T18:40:35.855Z   Invoked command: ['/sbin/service', u'vmware-vpxd', 'start']

2018-08-07T18:40:35.855Z   RC = 1

Stdout = vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd

Waiting for the embedded database to start up: success

Executing pre-startup scripts...

vmware-vpxd: Starting vpxd by administrative request.

success

vmware-vpxd: Waiting for vpxd to start listening for requests on 8089

Waiting for vpxd to initialize: ..........................................................Tue Aug  7 18:40:13 UTC 2018 Captured live core: /var/core/live_core.vpxd.16169.08-07-2018-18-40-13

[INFO] writing vpxd process dump retry:2 Time(Y-M-D H:M:S):2018-08-07 18:40:11

.Tue Aug  7 18:40:25 UTC 2018 Captured live core: /var/core/live_core.vpxd.16169.08-07-2018-18-40-25

[INFO] writing vpxd process dump retry:1 Time(Y-M-D H:M:S):2018-08-07 18:40:23

.failed

failed

vmware-vpxd: vpxd failed to initialize in time.

vpxd is already starting up. Aborting the request.

Stderr =

2018-08-07T18:40:35.856Z   {

    "resolution": null,

    "detail": [

        {

            "args": [

                "Command: ['/sbin/service', u'vmware-vpxd', 'start']\nStderr: "

            ],

            "id": "install.ciscommon.command.errinvoke",

            "localized": "An error occurred while invoking external command : 'Command: ['/sbin/service', u'vmware-vpxd', 'start']\nStderr: '",

            "translatable": "An error occurred while invoking external command : '%(0)s'"

        }

    ],

    "componentKey": null,

    "problemId": null

}

ERROR:root:Unable to start service vmware-vpxd, Exception: {

    "resolution": null,

    "detail": [

        {

            "args": [

                "vmware-vpxd"

            ],

            "id": "install.ciscommon.service.failstart",

            "localized": "An error occurred while starting service 'vmware-vpxd'",

            "translatable": "An error occurred while starting service '%(0)s'"

        }

    ],

    "componentKey": null,

    "problemId": null

}

Unable to start service vmware-vpxd, Exception: {

    "resolution": null,

    "detail": [

        {

            "args": [

                "vmware-vpxd"

            ],

            "id": "install.ciscommon.service.failstart",

            "localized": "An error occurred while starting service 'vmware-vpxd'",

            "translatable": "An error occurred while starting service '%(0)s'"

        }

    ],

    "componentKey": null,

    "problemId": null

0 Kudos
1 Solution

Accepted Solutions
jreal
Contributor
Contributor

It's running 6.0

Alright it's back up! Success! After taking a closer look in the VCSA's /var/log/vmware/vpxd/vpxd.log line by line after the SSL certs are exchanged between VCSA and the PSC, the VCSA was complaining that it could not find the SSO Admin server (/sso-adminserver/sdk/vcenter.local) when connecting to the PSC with 404 not found error as seen below:

2018-08-08T17:29:41.070Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=dbPortgroup] [VpxdInvtDVPortGroup::PreLoadDvpgConfig] loaded [9] dvpg config objects

2018-08-08T17:29:41.074Z warning vpxd[7F5E84185700] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f5e80e73f10, h:25, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:18090'>>, e: system:111(Connection refused)

2018-08-08T17:29:41.075Z error vpxd[7F5E84185700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007f5e80f162e0, TCP:localhost:18090>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection refused)

2018-08-08T17:29:41.075Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=LSClient] Caught exception while connecting to LS: N7Vmacore15SystemExceptionE(Connection refused)

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Solution user set to: vpxd-ec2ad075-6aed-89cb-frd5-95b89dfe0140@vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC's ServiceId in LookupService:

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC SSL certificate location: /etc/vmware-vpx/ssl/rui.crt

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:29:41.086Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:29:41.087Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

2018-08-08T17:29:41.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.

2018-08-08T17:29:51.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

.

.

.## A bunch of retries with the exact same output ###

.

.

2018-08-08T17:31:01.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.

2018-08-08T17:31:11.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:31:11.198Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Max attempts (10) reached. Giving up ...

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Unable to create SSO facade: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found).

2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] Init [Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)] took 90122 ms

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Main] [Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)

--> Backtrace:

-->

--> [backtrace begin] product: VMware VirtualCenter, version: 6.0.0, build: build-7462485, tag: vpxd

--> backtrace[00] libvmacore.so[0x003C5FC4]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)

--> backtrace[01] libvmacore.so[0x001F0743]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)

--> backtrace[02] libvmacore.so[0x0019A69D]: Vmacore::Throwable::Throwable(std::string const&)

--> backtrace[03] vpxd[0x00BD0D8E]: Vmomi::Fault::SystemError::Exception::Exception(std::string const&)

--> backtrace[04] vpxd[0x00BCE80A]

--> backtrace[05] vpxd[0x00BBAAD0]

--> backtrace[06] vpxd[0x00AF8E99]

--> backtrace[07] libc.so.6[0x0001EC36]

--> backtrace[08] vpxd[0x00AF88FD]

--> [backtrace end]

-->

2018-08-08T17:31:11.203Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] ServerApp::Init [TotalTime] took 94019 ms

2018-08-08T17:31:11.204Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Failed to intialize VMware VirtualCenter. Shutting down...

2018-08-08T17:31:11.204Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=SupportMgr] Wrote uptime information

2018-08-08T17:33:11.205Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Forcing shutdown of VMware VirtualCenter now

So I looked up this error string "Vmomi::Fault::SystemError while trying to connect to SSO Admin server: InvalidResponseExceptionE(Invalid response code: 404 Not Found)" and found a VMWare KB2061412 (https://kb.vmware.com/s/article/2061412) that requests to restart the VMWare Secure Token Service by issusing the following commands (in my case I had to do it in the Platform Controller instead of withing the VCSA)

/etc/init.d/vmware-stsd restart
/etc/init.d/vmware-sts-idmd restart

The article asks you to restart just the vxpd service in the VCSA shell by running /etc/init.d/vmware-vpxd restart, but that only gave me back the VCenter homepage and when trying to load the VSphere Web Client page it would display another error, so I decided to restart all services within VCSA shell by running:

service-control --stop --all

service-control --start --all

And after that I was able to sign out loud the chorus from GF Handel's Messiah "Haaaaaallelujah! Haaaaallelujah!" and was able to login and start managing my VMs.

Note: I also followed this VMWare KB2065630 (https://kb.vmware.com/s/article/2065630) where I added the entry <ThreadStackSizeKb>1024</ThreadStackSizeKb> to the vpxd.cfg file right before doing all the steps mentioned above, so not sure if this played a part.

To recap. I had to update DNS, NTP, add the CA I found in the SSO folder in PSC to the TrustedCerts.pem file and enable it under confix.xml (article mentioned before), then adding the <ThreadStackSizeKb>1024</ThreadStackSizeKb> I just mentioned, followed by restarting the VMWare Secure Token service in the PSC and restarting all services in VCSA.

I had taken a look before at the output of /var/log/vmware/vpxd/vpxd.log and noticed that the SSO connection to the PSC was complaining, but concentrated on the SSL certs since they were not being trusted. I honestly think that it would have probably worked if I had just restarted the Secure Token Service and then all services in the VCSA, but I'll never know unless I restore the whole thing again and try to solve this puzzle.

It's working now, so thanks everyone for your guidance with this adventure! Au revoir!

View solution in original post

15 Replies
vijayrana968
Virtuoso
Virtuoso

You may want to restart PSC first and don't restart vCenter appliance unless you verify PSC appliance has come online and all services are started. Then restart vCenter appliance and retry the task you are trying to do.

0 Kudos
GayathriS
Expert
Expert

Hi

adding to what Vijay said you can check below similar thread which is going on.

I see a similar thread going on in community :
vcenter service 6.5 stopped and is not starting

-->Make sure your NTP , DNS , and storage space is causing any issues here .

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

regards

Gayathri

0 Kudos
jreal
Contributor
Contributor

Thanks for responding. One thing I forgot to mention is that I had noticed that PSC would refuse connection on port 443 according to the VPXD log file and then when trying to start the services automatically using "service-control --start --all", it would end up saying that each service was set to manual and skipping each one, so I had to start one by one manually.

Everytime I restart PSC it comes up immediately as if all services get skipped. I'll restart VCenter one more time since I'm not sure where to set PSC to auto for all services using the shell.

0 Kudos
vijayrana968
Virtuoso
Virtuoso

vCenter services are depended on PSC due to SSO architecture. Since its skipping services it means problem reside on PSC itself. Please make sure you make up below services on PSC without any error.

VMware Appliance Management Service

VMware License Service

VMware Component Manager

VMware Identity Management Service

VMware HTTP Reverse Proxy

VMware Service Control Agent

VMware Security Token Service

VMware Common Logging Service

VMware Syslog Health Service

VMware Authentication Framework

VMware Certificate Service

VMware Directory Service

Then login to VAMI console of PSC and check health status

0 Kudos
jreal
Contributor
Contributor

GayathriS:

I took your recommendation and did find that the date and time were different between servers, so I updated the ntp.conf file and restarted ntp and now they show in sync. I'll try again in a bit

Vijay:

The services listed when doing a "service-controll --start --all" were the following which I manually started them without errors the last time I restarted PSC. Are you able to tell if it matches the long names in your list for PSC services? I'm trying to find a list that I can use to co-relate to make sure.

vmafdd

vmware-rhttpproxy

wmdird

vmcad

vmware-stsd

vmware-cm

vmware-cis-license

vmware-psc-client

vmware-sca

appltmgmt

vmware-syslog

vmware-syslog-health

0 Kudos
vijayrana968
Virtuoso
Virtuoso

Here you have the each service description and details : Platform Services Controller Services

But as you identified time was not in sync, please make sure the NTP source and sync configuration. That is the main requirement of communication between PSC and vCenter.

0 Kudos
jreal
Contributor
Contributor

VCenter is still restarting. It's taking an unusually long time. I also notice that the certificates whenever I browse to the name or IP don't show the entire certificate chain and only the device certificate. The CA is not visible in the chain.

0 Kudos
jreal
Contributor
Contributor

OK I think I'm a little closer now. in the logs I see that the VCenter appliance now is able to see the PSC, but because the CA is missing from the chain, there are entries that say they don't trust each other. I just need to find how to import the CA using the shell. I found this article but I'm not sure it's the right one: https://docs.vmware.com/en/VMware-vSphere/6.5/vsphere-esxi-vcenter-server-65-platform-services-contr...

2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created ComponentManagerGatewaySource!

2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created CmConnectionFSM

2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] Created ComponentManagerClient.

2018-08-07T21:10:42.893Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] CmConnectionFSM::RunFSM(ST_INIT)

2018-08-07T21:10:42.894Z info vpxd[7FEEE01BE7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-07T21:10:42.897Z error vpxd[7FEECEF8C700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007feee41fef60, TCP:mlxpsc1.corp.com:443>; cnx: (

null), error: N7Vmacore3Ssl18SSLVerifyExceptionE(SSL Exception: Verification parameters:

--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4

--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89

--> ExpectedPeerName: mlxpsc1.corp.com

--> The remote host certificate has these problems:

-->

--> * unable to get local issuer certificate)

2018-08-07T21:10:42.898Z error vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] [CisConnection]: Error getting trusted STS certificates: SSL Exception: Verification parameters:

--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4

--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89

--> ExpectedPeerName: mlxpsc1.corp.com

--> The remote host certificate has these problems:

-->

--> * unable to get local issuer certificate

2018-08-07T21:10:42.898Z warning vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] State(ST_INIT) failed with: SSL Exception: Verification parameters:

--> PeerThumbprint: 13:A3:98:1C:1B:84:FB:4D:EF:FA:1B:9E:3E:82:D4

--> ExpectedThumbprint: 0C:34:98:7B:2D:CA:F8:57:4E:1C:CC:A4:78:4B:8A:3V:89

--> ExpectedPeerName: mlxpsc1.corp.com

--> The remote host certificate has these problems:

-->

--> * unable to get local issuer certificate

2018-08-07T21:10:42.898Z warning vpxd[7FEEE01BE7A0] [Originator@6876 sub=HostGateway] ComponentManager service is not available! Will attempt a lazy init of CmClient on first use!

0 Kudos
TheVElement
VMware Employee
VMware Employee

How many PSCs are in this environment?

Given it's reporting a thumbprint mis-match from the given and expected thumbprints when trying to connect to the SSO VMOMI endpoint, I suspect that you have an issue with your SSL trust anchors.

When did you replace the PSC Machine SSL certificate, and by what method did you replace it (using the certificate-manager tool, using vecs-cli, etc)?

0 Kudos
Lalegre
Commander
Commander

Did you replace your vCenter certificate from a custom one or are you using VMCA certificates?

0 Kudos
jreal
Contributor
Contributor

Thanks for replying. Update so far: I followed this article Dude! Where's my vCSA SSL Cert chain? - vRyan.co.uk - Virtualization Blog  from someone experiencing this certificate issue, and like theVElement mentions, the trustpoints.pem file in the rhttpproxy SSL container was empty and config.xml had it commented on both the VCSA and the PSC appliances, which I don't know how it was working before the restore since it's in the rhttpproxy config from the backup I had.

However the problem still remains even after a reboot of both devices. Now this whole time I think the problem has been that the VPXD service in the VCSA never listens on port TCP 8089 as seen in my manual restart attempt of vmware-vpxd. Looking at logs from months before the restore, the messsage "vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd" has always been there and the VPXD used to start and listen on 8089 after a few seconds.

if I open the shell of VCSA and type "iptables -L port_filter -n --line-numbers" the port 8089 is never listed as listening, so I don't think it's a firewall issue here. The output is not very helpful as to what is going on.

mgmlxvcs1:~ # /etc/init.d/vmware-vpxd restart

vmware-vpxd: already stopped

vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd

Waiting for the embedded database to start up: success

Executing pre-startup scripts...

vmware-vpxd: Starting vpxd by administrative request.

success

vmware-vpxd: Waiting for vpxd to start listening for requests on 8089

Waiting for vpxd to initialize: ..........................................................Wed Aug  8 00:26:27 UTC 2018 Captured live core: /var/core/live_core.vpxd.6497.08-08-2018-00-26-27

[INFO] writing vpxd process dump retry:2 Time(Y-M-D H:M:S):2018-08-08 00:26:25

.Wed Aug  8 00:26:39 UTC 2018 Captured live core: /var/core/live_core.vpxd.6497.08-08-2018-00-26-39

[INFO] writing vpxd process dump retry:1 Time(Y-M-D H:M:S):2018-08-08 00:26:37

.failed

failed

vmware-vpxd: vpxd failed to initialize in time.

vpxd is already starting up. Aborting the request.

0 Kudos
TheVElement
VMware Employee
VMware Employee

Is this vSphere 6.0 or 6.5?

You'll want to look in /var/log/vmware/vpxd/vpxd.log to see if there's any indication why vCenter is taking so long to start.

0 Kudos
jreal
Contributor
Contributor

It's running 6.0

Alright it's back up! Success! After taking a closer look in the VCSA's /var/log/vmware/vpxd/vpxd.log line by line after the SSL certs are exchanged between VCSA and the PSC, the VCSA was complaining that it could not find the SSO Admin server (/sso-adminserver/sdk/vcenter.local) when connecting to the PSC with 404 not found error as seen below:

2018-08-08T17:29:41.070Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=dbPortgroup] [VpxdInvtDVPortGroup::PreLoadDvpgConfig] loaded [9] dvpg config objects

2018-08-08T17:29:41.074Z warning vpxd[7F5E84185700] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f5e80e73f10, h:25, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:18090'>>, e: system:111(Connection refused)

2018-08-08T17:29:41.075Z error vpxd[7F5E84185700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007f5e80f162e0, TCP:localhost:18090>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection refused)

2018-08-08T17:29:41.075Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=LSClient] Caught exception while connecting to LS: N7Vmacore15SystemExceptionE(Connection refused)

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Solution user set to: vpxd-ec2ad075-6aed-89cb-frd5-95b89dfe0140@vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC's ServiceId in LookupService:

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] VC SSL certificate location: /etc/vmware-vpx/ssl/rui.crt

2018-08-08T17:29:41.077Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] STS URI set to: https://mlxpsc1.corp.com/sts/STSService/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Admin URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][CreateSsoFacade]] [CreateUserDirectory] Groupcheck URI set to: https://mlxpsc1.corp.com/sso-adminserver/sdk/vcenter.local

2018-08-08T17:29:41.078Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:29:41.086Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:29:41.087Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

2018-08-08T17:29:41.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.

2018-08-08T17:29:51.087Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:29:51.097Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

.

.

.## A bunch of retries with the exact same output ###

.

.

2018-08-08T17:31:01.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Will attempt to connect again in 10 seconds.

2018-08-08T17:31:11.186Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [CreateServiceContent] Try to connect to SSO VMOMI endpoint.

2018-08-08T17:31:11.198Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3

2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Vmomi::Fault::SystemError while trying to connect to SSO Admin server: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found)

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoCertificateManagerImpl]] [RetryOnConnectionFailure] Max attempts (10) reached. Giving up ...

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=[SSO][SsoFactory_CreateFacade]] Unable to create SSO facade: N7Vmacore4Soap24InvalidResponseExceptionE(Invalid response code: 404 Not Found).

2018-08-08T17:31:11.199Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] Init [Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)] took 90122 ms

2018-08-08T17:31:11.199Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Main] [Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)

--> Backtrace:

-->

--> [backtrace begin] product: VMware VirtualCenter, version: 6.0.0, build: build-7462485, tag: vpxd

--> backtrace[00] libvmacore.so[0x003C5FC4]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)

--> backtrace[01] libvmacore.so[0x001F0743]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)

--> backtrace[02] libvmacore.so[0x0019A69D]: Vmacore::Throwable::Throwable(std::string const&)

--> backtrace[03] vpxd[0x00BD0D8E]: Vmomi::Fault::SystemError::Exception::Exception(std::string const&)

--> backtrace[04] vpxd[0x00BCE80A]

--> backtrace[05] vpxd[0x00BBAAD0]

--> backtrace[06] vpxd[0x00AF8E99]

--> backtrace[07] libc.so.6[0x0001EC36]

--> backtrace[08] vpxd[0x00AF88FD]

--> [backtrace end]

-->

2018-08-08T17:31:11.203Z warning vpxd[7F5E96A7F7A0] [Originator@6876 sub=VpxProfiler] ServerApp::Init [TotalTime] took 94019 ms

2018-08-08T17:31:11.204Z error vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Failed to intialize VMware VirtualCenter. Shutting down...

2018-08-08T17:31:11.204Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=SupportMgr] Wrote uptime information

2018-08-08T17:33:11.205Z info vpxd[7F5E96A7F7A0] [Originator@6876 sub=Default] Forcing shutdown of VMware VirtualCenter now

So I looked up this error string "Vmomi::Fault::SystemError while trying to connect to SSO Admin server: InvalidResponseExceptionE(Invalid response code: 404 Not Found)" and found a VMWare KB2061412 (https://kb.vmware.com/s/article/2061412) that requests to restart the VMWare Secure Token Service by issusing the following commands (in my case I had to do it in the Platform Controller instead of withing the VCSA)

/etc/init.d/vmware-stsd restart
/etc/init.d/vmware-sts-idmd restart

The article asks you to restart just the vxpd service in the VCSA shell by running /etc/init.d/vmware-vpxd restart, but that only gave me back the VCenter homepage and when trying to load the VSphere Web Client page it would display another error, so I decided to restart all services within VCSA shell by running:

service-control --stop --all

service-control --start --all

And after that I was able to sign out loud the chorus from GF Handel's Messiah "Haaaaaallelujah! Haaaaallelujah!" and was able to login and start managing my VMs.

Note: I also followed this VMWare KB2065630 (https://kb.vmware.com/s/article/2065630) where I added the entry <ThreadStackSizeKb>1024</ThreadStackSizeKb> to the vpxd.cfg file right before doing all the steps mentioned above, so not sure if this played a part.

To recap. I had to update DNS, NTP, add the CA I found in the SSO folder in PSC to the TrustedCerts.pem file and enable it under confix.xml (article mentioned before), then adding the <ThreadStackSizeKb>1024</ThreadStackSizeKb> I just mentioned, followed by restarting the VMWare Secure Token service in the PSC and restarting all services in VCSA.

I had taken a look before at the output of /var/log/vmware/vpxd/vpxd.log and noticed that the SSO connection to the PSC was complaining, but concentrated on the SSL certs since they were not being trusted. I honestly think that it would have probably worked if I had just restarted the Secure Token Service and then all services in the VCSA, but I'll never know unless I restore the whole thing again and try to solve this puzzle.

It's working now, so thanks everyone for your guidance with this adventure! Au revoir!

View solution in original post

jreal
Contributor
Contributor

Oh yeah, forgot that I also started all Platform controller VMWare services manually using:

service-control --start vmafdd

service-control --start vmware-rhttpproxy

service-control --start vmdird

service-control --start vmcad

service-control --start vmware-stsd

service-control --start vmware-cm

service-control --start vmware-cis-license

service-control --start vmware-psc-client

service-control --start vmware-sca

service-control --start applmgmt

service-control --start vmware-syslog

service-control --start vmware-syslog-health

and I just noticed that I'm missing in this list the second service mentioned in the article KB2061412 which is vmware-sts-idmd and it would have probably worked a long time ago. Oh well!

0 Kudos
jreal
Contributor
Contributor

************************************

*** This is the actual Answer ***

************************************

OK I actually got to the root of the problem by calling VMWare support since my search for PSC services not starting on auto was not yielding any results on my own. After explaining the issue and trying to prove that the problem is that the PSC services are currently set to Manual and that I just needed the right commands on how to set them to Auto from the shell, they asked me to follow KB2151528 (https://kb.vmware.com/s/article/2151528#/s/article/2151528) which says the following steps

1. Run this command to check the start status : chkconfig -A

For example:

#chkconfig -A

You see output similar to:

applmgmt off

vmafdd off
vmcad off
vmci on
vmdird off
vmware-cis-license off
vmware-cm off
vmware-psc-client off
vmware-rhttpproxy off
vmware-sca off
vmware-sts-idmd off
vmware-stsd off
vmware-syslog off
vmware-syslog-health off
vmware-tools-services on
vmware-tools-vgauth on
vsock on

In the above example, most of the VMware service status is OFF.


2. If the service status is set to off, then perform these steps to turn it ON.

a. Run this command to stop all the running services:

service-control --stop --all

b. Run this command to change the service state from Off to On state:

chkconfig <servicename> on

For example:
chkconfig vmware-cm on

c. Once all the service status is set to ON, run this command to start all the services:

service-control --start --all

Note: Change the status of the dependent service to ON before changing the status of any service that you are trying to start. Attempting to change the status without changing the status of the dependent service will fail.

For example:

#chkconfig vmware-cis-license on
insserv: FATAL: service vmware-cm has to be enabled to use service vmware-cis-license
insserv: Service syslog is missed in the runlevels 2 to use service cgrulesengd
insserv: exiting now!
/sbin/insserv failed, exit code 1

So I used my Excell chops and created this list and pasted it a few times in the shell of the PSC until I got no errors

chkconfig  applmgmt on

chkconfig  vmafdd on

chkconfig  vmcad on

chkconfig  vmdird on

chkconfig  vmware-cis-license on

chkconfig  vmware-cm on

chkconfig  vmware-psc-client on

chkconfig  vmware-rhttpproxy on

chkconfig  vmware-sca on

chkconfig  vmware-sts-idmd on

chkconfig  vmware-stsd on

chkconfig  vmware-syslog on

chkconfig  vmware-syslog-health on

And Voila! After a restart of both PSC and VCSA all is well again!

0 Kudos