VMware Cloud Community
billdossett
Hot Shot
Hot Shot

upgrading vcsa 6.7 to 7 failed - now services not starting on the original 6.7 vcsa

So security is on my *ss about upgrading vcenter... and I tried to upgrade from 6.7 with external psc to 7.  It failed - said that the source may have been turned off during the upgrade... well something happened for sure.   I shutdown the vcenter 7 which was not functioning anyway tried it again - this was like 10 hours ago and I can't remember exactly what the problem was, but it had failed on stage 2, the data had been copied to the new vcenter, but it failed importing it.  

So I fired up the old vcenter and its not working.   The services are not all starting:

stopped:
vmcam vmware-content-library vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-sps vmware-updatemgr vmware-vcha vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vsan-dps

are stopped, the vmware-vsan-health and one other are in pending state for ages.  The result is I am getting the dreaded

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x00007f435c0076b0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

when I try to login to the server.  I made backups, have tried doing a restore and it had a few warnings etc, and didn't exactly finish cleanly - but, it fails in the same way when I try to start it, same exact services not starting...  which makes me think maybe something on the PSC is failing - it has an external PSC.. but I can login to the VAMI on that and it looks fine, everything says health good.

Upgrading in this method should be non-destructive so I am really scratching my head about what to look for next - anyone have any ideas?  I know all of those services don't start normally but not sure which are causing problem and therefore what logs to look in - after I get some food and sleep I will start on them one by one for each I guess, but if anyone has any advice on where I should be looking or what could be wrong I would HUGELY appreciate it as I am supposed to be on vacation this week and I won't be until this is fixed..  😞 Thanks. Billl

Bill Dossett
0 Kudos
2 Replies
billdossett
Hot Shot
Hot Shot

So I found errors in my vmdird log on the pcs

19-05-15T20:27:41.457011+00:00 err vmdird  t@140166960015104: SASLSessionStart: sasl error (-20)(SASL(-13): user not found: no secret in database)

in KB Recreate vCenter Server Machine Account in Platform Services Controller after Failed Convergence (68...

it says this can happen when you are doing a converge PSC.  I was under the impression that when I did the upgrade from 6.7 to 7.0 that nothing was going to be changed in my 6.7 environment...  this is obviously not true as the SSO user had been removed, hence the failure in the log :-(.  I restored my PSC, and now the vpxd-svcs is starting - however the vpxd service is still not starting - two services, one of them sps, and I think updatemgr take ages on pending and then go to stopped.  currently these are my stopped services:

vmcam vmware-content-library vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-sps vmware-updatemgr vmware-vcha vmware-vpxd vmware-vsan-health vsan-dps

obviously need to get vpxd to start - is there anything else in there that I should be focusing on ?

thanks

Bill Dossett
0 Kudos
billdossett
Hot Shot
Hot Shot

So, in conclusion - the fact that the external PSC SSO user account was removed during the upgrade from 6.7 to 7 was the lynchpin.  I am still not 100% sure why the original vCenter would not start once I resrtored the PSC which had the SSO user account that was required... however I once again restored the vCenter appliance from backup and realize now the problem with the first vCenter appliance restore failing was because at the end of restoring the data from the ftp server, it tried to start the vcenter server services...  I had stepped away from the machine when that happened and when I returned it just said it had failed.  It had failed because at that point I had not restored the PSC, the SSO user account was missing and therefore the services would not start on the restored vCenter appliance either.  So when I tried a last ditch restore of the vCenter after restoring the PSC - the services started and the vCenter restore worked.  All up and running again.. I don't believe it is documented anywhere that your external PSC SSO account will be removed during an upgrade to 6.7 to 7.  The KB above was the only reference... I did not run the script to recreate the account as I had a backup... also the KB seems to be saying to run the script on the vCenter appliance, but the SSO user is clearly defined on the external PSC ... so the ambiguity there made me decide to do the PSC restore instead.  So, hopefully this may help some poor soul in the future that runs into this same problem.  If not at least it is here for my reference.   Merry Christmas.

Bill Dossett
0 Kudos