10 Replies Latest reply on Nov 22, 2018 4:38 PM by adminjam

    vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!

    jhboricua Novice

      This happened to me a few months ago on a fresh install of the vcenter appliance 6.5. It just stopped working a week or two after applying an update. Services would not start and there was no indication as to why. It wasn't a space issue, it wasn't that other issue with a duplicate value in the vpostgres database I read about either. I finally gave up and wiped it out to redeploy from scratch.

       

      Well low and behold, sometime last night vcenter stopped working again. This time it wasn't even a full week after having applied the 6.5.0c patch. Only two services start, none of the others will. My deployment is two appliances, a PSC and the vCenter. The PSC appears fine and the services are showing healthy. The vCenter turned to garbage again.  Here's an output of service-control:

       

      root@mp1vsivcs501 [ ~ ]# service-control --status
      Running:
       lwsmd vmafdd
      Stopped:
       applmgmt vmcam vmonapi vmware-cm vmware-content-library vmware-eam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-perfcharts vmware-rbd-watchdog vmware-rhttpproxy vmware-sca vmware-sps vmware-statsmonitor vmware-updatemgr vmware-vapi-endpoint vmware-vcha vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
      
      

       

      Trying to start any service produces a similar output:

       

      root@mp1vsivcs501 [ ~ ]# service-control --start vmware-vpxd-svcs
      Perform start operation. vmon_profile=None, svc_names=['vmware-vpxd-svcs'], include_coreossvcs=False, include_leafossvcs=False
      2017-04-24T19:36:49.136Z   Running command: ['/usr/bin/systemctl', 'set-environment', 'VMON_PROFILE=NONE']
      2017-04-24T19:36:49.140Z   Done running command
      2017-04-24T19:36:49.143Z   Running command: ['/usr/bin/systemctl', 'daemon-reload']
      2017-04-24T19:36:49.222Z   Done running command
      2017-04-24T19:36:49.222Z   Running command: ['/usr/bin/systemctl', 'set-property', u'vmware-vmon.service', 'MemoryAccounting=true', 'CPUAccounting=true', 'BlockIOAccounting=true']
      2017-04-24T19:36:49.227Z   Done running command
      2017-04-24T19:36:49.231Z   RC = 1
      Stdout = 
      Stderr = Failed to execute operation: Unit file is masked
      
      
      2017-04-24T19:36:49.231Z   {
          "resolution": null, 
          "detail": [
              {
                  "args": [
                      "Stderr: Failed to execute operation: Unit file is masked\n"
                  ], 
                  "id": "install.ciscommon.command.errinvoke", 
                  "localized": "An error occurred while invoking external command : 'Stderr: Failed to execute operation: Unit file is masked\n'", 
                  "translatable": "An error occurred while invoking external command : '%(0)s'"
              }
          ], 
          "componentKey": null, 
          "problemId": null
      }
      2017-04-24T19:36:49.231Z   Running command: ['/usr/bin/systemctl', 'unset-environment', 'VMON_PROFILE']
      2017-04-24T19:36:49.235Z   Done running command
      Error executing start on service vpxd-svcs. Details {
          "resolution": null, 
          "detail": [
              {
                  "args": [
                      "vmware-vmon"
                  ], 
                  "id": "install.ciscommon.service.failstart", 
                  "localized": "An error occurred while starting service 'vmware-vmon'", 
                  "translatable": "An error occurred while starting service '%(0)s'"
              }
          ], 
          "componentKey": null, 
          "problemId": null
      }
      Service-control failed. Error {
          "resolution": null, 
          "detail": [
              {
                  "args": [
                      "vmware-vmon"
                  ], 
                  "id": "install.ciscommon.service.failstart", 
                  "localized": "An error occurred while starting service 'vmware-vmon'", 
                  "translatable": "An error occurred while starting service '%(0)s'"
              }
          ], 
          "componentKey": null, 
          "problemId": null
      }
      

       

      The first thing that pops out for me is line 11, "Failed to execute operation: Unit file is masked". I get that on every service I attempt to start and I'm not finding anything in VMware's knowledge portal about it. This is extremely frustrating.

       

      **Additional info**

      Running a search on just unit file is masked took me to a generic ubuntu thread about systemctl showing masked unit files. Here's the output of the systemctl list-unit-files:

       

      root@mp1vsivcs501 [ ~ ]# systemctl list-unit-files | grep vmware
      vmware-bigsister.service               static  
      vmware-cm.service                      masked  
      vmware-content-library.service         masked  
      vmware-eam.service                     masked  
      vmware-firewall.service                enabled 
      vmware-imagebuilder.service            masked  
      vmware-mbcs.service                    masked  
      vmware-netdump.service                 masked  
      vmware-perfcharts.service              masked  
      vmware-rbd-watchdog.service            masked  
      vmware-rhttpproxy.service              masked  
      vmware-sca.service                     masked  
      vmware-sps.service                     masked  
      vmware-statsmonitor.service            masked  
      vmware-updatemgr.service               masked  
      vmware-vapi.service                    masked  
      vmware-vcha.service                    masked  
      vmware-vmon.service                    masked  
      vmware-vmonapi.service                 masked  
      vmware-vpostgres.service               masked  
      vmware-vpxd-svcs.service               masked  
      vmware-vpxd.service                    masked  
      vmware-vsan-health.service             masked  
      vmware-vsm.service                     masked  
      vmware-bigsister.timer                 disabled
      

       

      Not sure if that's normal or not, but it appears to be what the error message is complaining about?

       

      Message was edited by: jhboricua

        • 1. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
          petermie Novice

          I ran into the same issue as part of a very long (20+ hour) P1 call on my vpxd service crashing if a VM gets assigned an invalid VDS network port group, the only resolution was to restore from backup or redeploy unfortunately

          • 2. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
            jhboricua Novice

            There's gotta be something else to this. I'm not running a VDS in my setup. It's all standard vSwitches.

            • 3. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
              eai Novice

              We had this issue last week.  After a 3-hr support call with a vCenter support engineer, he came up with the idea of looking around the forums.  The fix is to UNMASK vmon.service:

              systemctl unmask vmon.service

              Then reboot your appliance.  This fixes the issue.

               

              We still do not know why the vmon service got masked to begin with.  Maybe some kind of race condition during shutdown, it does a lot of systemctl masking/unmasking via the appliance start up and shut down scripts?

              • 4. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                jemfields Novice

                HI -

                 

                Warning 1. The following is a Linux solution to the problem and does not take into account any of the configurations and reasons for the masking.

                Warning 2. The ongoing failures seems to be caused by the system boot / shutdown process - so external issues may still be in play .. be careful - suggest only for lab testing  ...

                 

                login as root ...

                enter

                 

                shell <cr>

                 

                cd /etc/systemd/system

                 

                <Please note this may just seem to be a directory BUT there is a lot going on here directly connected to kernel>

                 

                ls -lisa

                 

                < here is the files I found masked>

                 

                root@localhost [ ~ ]# systemctl list-unit-files | grep masked

                applmgmt.service                       masked 

                vmcam.service                          masked 

                vmware-cis-license.service             masked 

                vmware-cm.service                      masked 

                vmware-content-library.service         masked 

                vmware-eam.service                     masked 

                vmware-imagebuilder.service            masked 

                vmware-mbcs.service                    masked 

                vmware-netdump.service                 masked 

                vmware-perfcharts.service              masked 

                vmware-pschealth.service               masked 

                vmware-rbd-watchdog.service            masked 

                vmware-rhttpproxy.service              masked 

                vmware-sca.service                     masked 

                vmware-sps.service                     masked 

                vmware-statsmonitor.service            masked 

                vmware-updatemgr.service               masked 

                vmware-vapi.service                    masked 

                vmware-vcha.service                    masked 

                vmware-vmonapi.service                 masked 

                vmware-vpostgres.service               masked 

                vmware-vpxd-svcs.service               masked 

                vmware-vpxd.service                    masked 

                vmware-vsan-health.service             masked 

                vmware-vsm.service                     masked 

                vsphere-client.service                 masked 

                vsphere-ui.service                     masked 

                ctrl-alt-del.target                    masked 

                 

                then look in the directory

                root@localhost [ /etc/systemd/system ]# ls -lisa

                total 108

                451046 4 drwxr-xr-x 24 root root 4096 May  9 03:03 .

                450562 4 drwxr-xr-x  7 root root 4096 May  9 01:35 ..

                452681 0 lrwxrwxrwx  1 root root    9 May  8 08:11 applmgmt.service -> /dev/null

                467464 4 drwxr-xr-x  2 root root 4096 May  8 08:19 applmgmt.service.d

                451876 0 lrwxrwxrwx  1 root root   40 Oct 22  2016 default.target -> /usr/lib/systemd/system/runlevel3.target

                451048 4 drwxr-xr-x  2 root root 4096 May  8 17:08 getty.target.wants

                467484 4 drwxr-xr-x  2 root root 4096 May  8 08:19 halt.target.wants

                451195 4 drwxr-xr-x  2 root root 4096 Oct 22  2016 local-fs.target.wants

                467460 4 drwxr-xr-x  2 root root 4096 May  8 08:19 lwsmd.service.d

                451050 4 drwxr-xr-x  2 root root 4096 May  8 09:03 multi-user.target.wants

                451054 4 drwxr-xr-x  2 root root 4096 Oct 22  2016 network-online.target.wants

                467486 4 drwxr-xr-x  2 root root 4096 May  8 08:19 poweroff.target.wants

                467482 4 drwxr-xr-x  2 root root 4096 May  8 08:19 reboot.target.wants

                451834 4 -rw-r--r--  1 root root  268 Jun  7  2016 sendmail.service

                467117 4 drwxr-xr-x  2 root root 4096 May  8 08:19 shutdown.target.wants

                452083 4 -rw-r--r--  1 root root  476 Aug 22  2016 snmpd.service

                451056 4 drwxr-xr-x  2 root root 4096 Oct 22  2016 sockets.target.wants

                451058 4 drwxr-xr-x  2 root root 4096 Oct 22  2016 sysinit.target.wants

                451107 0 lrwxrwxrwx  1 root root   39 Oct 22  2016 syslog.service -> /usr/lib/systemd/system/rsyslog.service

                452464 4 -r-xr-xr-x  1 root root  470 Jan 18 10:08 vcha-hacheck.service

                452104 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmafdd.service.d

                452121 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmcad.service.d

                452752 0 lrwxrwxrwx  1 root root    9 May  8 08:17 vmcam.service -> /dev/null

                467023 4 drwxr-xr-x  2 root root 4096 May  8 17:10 vmcam.service.d

                452116 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmdird.service.d

                452157 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmdnsd.service.d

                451129 4 drwxr-xr-x  2 root root 4096 Oct 22  2016 vmtoolsd.service.requires

                452654 0 lrwxrwxrwx  1 root root    9 May  8 08:10 vmware-cis-license.service -> /dev/null

                452651 0 lrwxrwxrwx  1 root root    9 May  8 08:09 vmware-cm.service -> /dev/null

                452726 0 lrwxrwxrwx  1 root root    9 May  8 08:14 vmware-content-library.service -> /dev/null

                452734 0 lrwxrwxrwx  1 root root    9 May  8 08:16 vmware-eam.service -> /dev/null

                452761 0 lrwxrwxrwx  1 root root    9 May  8 08:18 vmware-imagebuilder.service -> /dev/null

                452707 0 lrwxrwxrwx  1 root root    9 May  8 08:12 vmware-mbcs.service -> /dev/null

                452684 0 lrwxrwxrwx  1 root root    9 May  8 08:11 vmware-netdump.service -> /dev/null

                452763 0 lrwxrwxrwx  1 root root    9 May  8 08:18 vmware-perfcharts.service -> /dev/null

                467114 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmware-psc-client.service.d

                452247 0 lrwxrwxrwx  1 root root    9 May  8 08:11 vmware-pschealth.service -> /dev/null

                452745 0 lrwxrwxrwx  1 root root    9 May  8 08:16 vmware-rbd-watchdog.service -> /dev/null

                452646 0 lrwxrwxrwx  1 root root    9 May  8 08:09 vmware-rhttpproxy.service -> /dev/null

                452664 0 lrwxrwxrwx  1 root root    9 May  8 08:10 vmware-sca.service -> /dev/null

                452435 0 lrwxrwxrwx  1 root root    9 May  8 08:16 vmware-sps.service -> /dev/null

                452690 0 lrwxrwxrwx  1 root root    9 May  8 08:11 vmware-statsmonitor.service -> /dev/null

                467472 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmware-stsd.service.d

                467476 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmware-sts-idmd.service.d

                452749 0 lrwxrwxrwx  1 root root    9 May  8 08:17 vmware-updatemgr.service -> /dev/null

                452667 0 lrwxrwxrwx  1 root root    9 May  8 08:10 vmware-vapi.service -> /dev/null

                452751 0 lrwxrwxrwx  1 root root    9 May  8 08:17 vmware-vcha.service -> /dev/null

                467468 4 drwxr-xr-x  2 root root 4096 May  8 08:19 vmware-vmon.service.d

                452692 0 lrwxrwxrwx  1 root root    9 May  8 08:11 vmware-vpostgres.service -> /dev/null

                452718 0 lrwxrwxrwx  1 root root    9 May  8 08:12 vmware-vpxd.service -> /dev/null

                452704 0 lrwxrwxrwx  1 root root    9 May  8 08:11 vmware-vpxd-svcs.service -> /dev/null

                452483 0 lrwxrwxrwx  1 root root    9 May  8 17:11 vmware-vsan-health.service -> /dev/null

                452758 0 lrwxrwxrwx  1 root root    9 May  8 08:18 vmware-vsm.service -> /dev/null

                 

                 

                 

                 

                <This will display all the files and more importantly links in the system .. we need to remove all the links to /dev/null>

                <I have removed them all ,,,, but it may be a case - only some of them should be removed .. remember this is a kernel control area>

                <and there are usually good reason to stop root for doing things ...- the masking is a protective process which is like a database holding a process until finishing a write .. >

                 

                <this command will remove all the links and ignore the directories ...>

                 

                rm vmware*

                 

                < I also did the other files that where linked to /dev/nul ...>

                 

                then reboot

                 

                reboot <cr>

                 

                hope that helps - I am not a vmware specialists - my knowledge is linux and this is a systems solution that may not fix an underlying issue  ....

                 

                regards

                 

                Jeremy

                1 person found this helpful
                • 5. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                  JacobDEvans Novice

                  Still a bug in December Patch, here are some easier steps to unmask all services.

                   

                   

                   


                  # List all disabled services for removal. 
                  find /etc/systemd/system/ -lname '/dev/null' -exec ls {} \;  
                  
                  # Automatically remove them (or rm each file)
                  find /etc/systemd/system/ -lname '/dev/null' -exec rm {} \; 
                  
                  # Relaod systemctl daemon
                  systemctl daemon-reload 
                  
                  # Start services or Reboot
                  service-control --start --all
                  
                  
                  5 people found this helpful
                  • 6. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                    csmith70 Lurker

                    Those 4 lines did the trick for me JacobDEvans, thanks for the post.

                    • 7. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                      jo.strasser Hot Shot

                      Hi@all,

                       

                      this issue isn´t fixed with the latest VCSA 6.5U1g update.

                       

                      Information from VMware Support:

                      This issue is tracked and will be fixed with vCenter 6.5 UPDATE2.

                       

                      For now the workaround described by JacobDEvans is supported.

                       

                      Thanks and BR/JO!

                      1 person found this helpful
                      • 8. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                        AlexPotter Lurker

                        Just wanted to update on status of this. Still having the same problem with 6.5 U2b. Solution still works.

                        • 9. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                          EddieJvCORE Novice

                          Thanks for the solution. It was driving me crazy. I would still like to know the cause.

                          • 10. Re: vCenter 6.5 - vcenter appliance stops working out of the blue, AGAIN!!
                            adminjam Lurker

                            So what caused it for me was snapshoting the vcenter server while it was on.  After restoring from that snapshot it had this issue.