9 Replies Latest reply on Sep 10, 2015 12:53 PM by philzy

    Problem with VSAN Health Check Plugin installation

    philzy Enthusiast

      Hi!

       

      After executing - /usr/lib/vmware-vpx/vsan-health/health-rpm-post-install.sh

       

      I get this output:

      /usr/lib/vmware-vpx/vsan-health/health-rpm-post-install.sh --force

      /usr/lib/vmware-vpx/workflow/bin

      2015-05-15T21:32:05.625Z   Getting value for install-parameter: workflow.int.ser   vice-port

      2015-05-15T21:32:05.633Z   Getting value for install-parameter: workflow.int.jmx   -port

      2015-05-15T21:32:05.643Z   Getting value for install-parameter: vpxd.int.sdk-por   t

      2015-05-15T21:32:05.650Z   Getting value for install-parameter: vpxd.int.sdk-tun   nel-port

      2015-05-15T21:32:05.658Z   Getting value for install-parameter: rhttpproxy.ext.p   ort1

      2015-05-15T21:32:05.665Z   Getting value for install-parameter: rhttpproxy.ext.p   ort2

      {'vpxd_sdk_tunnel_port': '8089', 'rhttpproxy_https_port': '443', 'rhttpproxy_htt   p_port': '80', 'workflow_service_port': '8088', 'vpxd_sdk_port': '8085', 'PASSWO   RD': '', 'workflow_jmx_port': '19999'}

      2015-05-15T21:32:05.673Z   Getting value for install-parameter: syslog.ext.port

      2015-05-15T21:32:05.682Z   Getting value for install-parameter: vc.home.path

      2015-05-15T21:32:05.690Z   Getting value for install-parameter: vc.conf.path

      2015-05-15T21:32:05.691Z   VSAN Health service firstboot started

      2015-05-15T21:32:05.702Z   User %s already exists, skipping creation.

      2015-05-15T21:32:05.710Z   Getting value for install-parameter: rhttpproxy.cert

      2015-05-15T21:32:05.710Z   WARNING Value for install-parameter rhttpproxy.cert i   s empty

      Traceback (most recent call last):

        File "/usr/lib/vmware-vpx/firstboot/vsanhealth_firstboot.py", line 292, in Mai   n

          res = vsanhealth_fb.get_rp_cert_info()

        File "/usr/lib/vmware/site-packages/cis/firstboot.py", line 185, in get_rp_cer   t_info

          thumbprint, ssl_trust, crt = get_certinfo(rp_cert_file)

        File "/usr/lib/vmware/site-packages/cis/tools.py", line 184, in get_certinfo

          f.readFile(cert_file)

        File "/usr/lib/vmware/site-packages/cis/utils.py", line 1028, in readFile

          loErrMsg = localizedString(errMsg, file_name, e)

      TypeError: localizedString() takes at most 2 arguments (3 given)

      2015-05-15T21:32:05.712Z   VSAN Health firstboot failed

      Traceback (most recent call last):

        File "/usr/lib/vmware-vpx/firstboot/vsanhealth_firstboot.py", line 343, in <mo   dule>

          Main()

        File "/usr/lib/vmware-vpx/firstboot/vsanhealth_firstboot.py", line 333, in Mai   n

          if eInfo and eInfo.detail:

      UnboundLocalError: local variable 'eInfo' referenced before assignment

      vmware-vpxd: Stopping vpxd by administrative request. process id was 9301

      success

      vmware-vpxd: VC SSL Certificate does not exist, it will be generated by vpxd

      Waiting for the embedded database to start up: success

      Executing pre-startup scripts...

      vmware-vpxd: Starting vpxd by administrative request.

      success

      vmware-vpxd: Waiting for vpxd to start listening for requests on 8089

      Waiting for vpxd to initialize: .success

      vmware-vpxd: vpxd has initialized.

      Last login: Fri May 15 21:18:53 UTC 2015 on console

      Stopping VMware vSphere Web Client...

      Stopped VMware vSphere Web Client.

      Last login: Fri May 15 21:32:20 UTC 2015 on pts/1

      Starting VMware vSphere Web Client...

      Waiting for VMware vSphere Web Client......

      running: PID:30348

      2015-05-16 00_48_38-vSphere Web Client.png

      As the result - no buttons.

       

      As far as i understand - there is some problems with certificate.

      So, help me with troubleshooting, please.

      Thank you.

        • 1. Re: Problem with VSAN Health Check Plugin installation
          jonretting Enthusiast

          Running into the same issue here with VCSA. Was able to replicate this problem on two fresh vSphere 6.0 clusters. Looking into "vsanhealth_firstboot.py" and "firstboot.py" i noticed some notes, and some commented out stuff related to the problem. Looks like these scripts are still a work in progress.

           

          Some snippets of the previously mentioned python scripts:

          -- vsanhealth_firstboot.py -- LINE: 292

                   res = vsanhealth_fb.get_rp_cert_info()

                   print str(res)

                   # XXX: Generating certs doesn't work when invoked after the initial boot

                   #res = vsanhealth_fb.generate_certs()

                   #print str(res)

           

          So certificate generation doesn't take place since "res = vsanhealth_fb.get_rp_cert_info()". The function "generate_certs()" seems to have other issues including password generation, and other stuff that needs debugging. Hopefully this gets shored up in the next update/patch/bugfix.

           

          Here are the mentioned functions from "firstboot.py".

          <code>

          def get_rp_cert_info(self):

                rp_cert_file = wait_for_install_parameter('rhttpproxy.cert')

                thumbprint, ssl_trust, crt = get_certinfo(rp_cert_file)

                self._rp_crt_info = {

                   'cert_file' : rp_cert_file,

                   'thumbprint' : thumbprint,

                   'ssl_trust' : ssl_trust,

                   'crt' : crt

                }

           

             # XXX TODO: Delete the generate_certs function after all firstboot scripts

             # switch to use certs and solution user generated in soluser_firstboot.py

             def generate_certs(self, generate_jks=False, component_name=None):

                #

                # TODO: Currently, the certs are generated in a temp location, need to

                # modify this code to directly create certs in the location provided

                # by the component for storing the certs

                #

                if component_name is None:

                   component_name = self._component_name

                vmca = CerTool()

                vmca.GenCert(component_name)

                cert_info = {}

                cert_info['cert_file'] = vmca.GetCertFileName()

                cert_info['private_key_file'] = vmca.GetPrivateKeyFileName()

                cert_info['public_key_file'] = vmca.GetPublicKeyFileName()

                cert_info['pfx_file'] = vmca.GetPfxFileName()

                cert_info['password'] = "foo" #vmca.GetPassword()

           

                create_dir(self.get_ssl_path())

                copyfile(cert_info['cert_file'], self.get_public_crt())

                copyfile(cert_info['private_key_file'], self.get_private_key())

                copyfile(cert_info['pfx_file'], self.get_pfx_file())

           

                if generate_jks:

                    log('Creating JKS keystore ...')

                    # If -deststorepass and -srcstorepass arguments are not specified

                    # while invoking keytool, keytool will prompt for the destination

                    # keystore password twice and the source keystore password once:

                    # Enter destination keystore password:

                    # Re-enter new password:

                    # Enter source keystore password:

                    # Since, we reuse the src keystore password as the destination

                    # keystore password, we repeat it thrice in stdin. We do not

                    # specify the passwords in the command line for security reasons

                    # as well as the fact that keytool does not like passwords that start

                    # with "-J"

                    pwd_stdin = 3 * ('%s\n' % cert_info['password'])

                    try:

                       invoke_command([get_keytool(),

                                      '-importkeystore',

                                      '-destkeystore', self.get_jks_file(),

                                      '-srckeystore', self.get_pfx_file(),

                                      '-srcstoretype', 'PKCS12',

                                      '-alias', self._constants['key_alias']],\

                                      pwd_stdin)

                    except InvokeCommandException as e:

                       err = _T('install.ciscommon.firstboot.create.jkskeystore',

                                'ERROR: Failed to create JKS Keystore.')

                       err_lmsg = localizedString(err)

                       e.appendErrorStack(err_lmsg)

                       raise e

           

                    self.import_rp_cert_in_jks(self.get_jks_file(), cert_info['password'])

           

                return cert_info

          </code>

           

          Cheers

          • 2. Re: Problem with VSAN Health Check Plugin installation
            philzy Enthusiast

            Ok, thank you.

            But as far as i see from this code snippets there is no any kind of workaround till now.

            So, I'm going to wait for next release of that plug in.  

            • 3. Re: Problem with VSAN Health Check Plugin installation
              rbolgerTrace3 Novice

              I was having the same issue with my installation. The root cause of the problem seem to be warning about rhttpproxy.cert being empty.  I noticed that in other people's installations this value was returning the value /etc/vmware-rhttpproxy/ssl/rui.crt which is basically the path to the certificate that the vCenter web client serves.  That file existed on my installation and I verified the cert details with openssl.  So I was left wondering why rhttpproxy.cert was being read as empty.  After extensive google'ing, I stumbled across one of William Lam's blog posts (vCenter Server 6.0 Tidbits Part 1: What install & deployment parameters did I use? | virtuallyGhetto) mentioning the /bin/install-parameter utility.

               

              And indeed, running "/bin/install-parameter rhttpproxy.cert" returned an empty value on my system.  So I took a look at the source (python) for that utility and it appeared to have an optional argument called --setdefault which would supposedly let you set a default value for the parameter.  So I ran "/bin/install-parameter rhttpproxy.cert -s /etc/vmware-rhttpproxy/ssl/rui.crt" which appears to ahve worked.  Now when I run the original command to query the value, it returns the default path.

               

              And finally, I tried re-running the health-rpm-post-install.sh and it claims to have worked.  But unfortunately, I'm still not quite there.  The "Enable" button is still missing from the web client health service page.

              • 4. Re: Problem with VSAN Health Check Plugin installation
                jonretting Enthusiast

                I just tested your solution out on one heavily used VCSA and a vanilla one, and it was a full success. Much appreciated!

                 

                Thanks,

                -Jon

                • 5. Re: Problem with VSAN Health Check Plugin installation
                  rbolgerTrace3 Novice

                  Glad it worked! I managed to solve the rest of my problem getting the plugin loaded as well.  Setting rhttpproxy.cert fixed the problem with health-rpm-post-install.sh finishing successfully.  But after starting the vmware-vsan-health service, health page in the web client still never loaded the buttons like "Enable".

                   

                  I checked /var/log/vmware/vsan-health/vmware-vsan-health-service.log and noticed it was spamming "Failed to log into VC, retrying in 10 seconds" over and over.  So I went digging through the python source in /usr/lib/vmware-vpx/vsan-health. I managed to figure out that while starting up the web service that hosts the plugin, it tries to connect to vCenter using the vCenter's own SSL cert and private key (rui.crt and rui.key) in /etc/vmware-vpx/ssl. On my VCSA, the permissions in that folder looked like this:

                   

                  myvcsa:/etc/vmware-vpx/ssl # ls -la

                  total 28

                  drwxr-x---  2 root cis  4096 Jul 20 05:00 .

                  drwxr-xr-x 14 root root 4096 Jul 21 04:05 ..

                  -rw-------  1 root root 3416 Apr 30 05:36 rui.crt

                  -rw-------  1 root root 1704 Apr 30 05:36 rui.key

                  -rw-------  1 root root   65 Apr 30 05:19 symkey.dat

                  -rw-------  1 root root 3343 Apr 30 05:36 vcsoluser.crt

                  -rw-------  1 root root 1704 Apr 30 05:36 vcsoluser.key

                   

                  Now I knew that the health service was running as a local user called vsan-health. So there's no way it would be able to read those files.  Luckily, I had a mostly vanilla VCSA that I could compare it with.  Here's what the vanilla VCSA folder looked like:

                   

                  myvcsa:/etc/vmware-vpx/ssl # ls -la

                  total 28

                  drwxr-x---  2 root cis  4096 Jul 20 05:00 .

                  drwxr-xr-x 14 root root 4096 Jul 21 04:24 ..

                  -rw-r-----  1 root cis  3416 Apr 30 05:36 rui.crt

                  -rw-r-----  1 root cis  1704 Apr 30 05:36 rui.key

                  -rw-------  1 root root   65 Apr 30 05:19 symkey.dat

                  -rw-r-----  1 root cis  3343 Apr 30 05:36 vcsoluser.crt

                  -rw-r-----  1 root cis  1704 Apr 30 05:36 vcsoluser.key

                   

                  Notice the group ownership difference on the cert related files and the change from 600 to 640 permissions.  When I saw this, I also remembered seeing in the vsan firstboot script that the vsan-health user was being added to the cis group.  As soon as I made my broken VCSA's permissions match the vanilla, the service started up and everything started working.  I'm guessing the reason my permissions were out of whack is a bug with the SSL replacement scripts.  One of the first things I do on my vCenter is update the SSL certs with custom ones from our PKI infrastructure.  I'm guessing that process is currently not working quite right and screws up the permissions on the files that get replaced.

                  • 6. Re: Problem with VSAN Health Check Plugin installation
                    jonretting Enthusiast

                    That's great find, well done and hopefully the plug-in developers take a look at this. Took note of your solution for perm diff, as I am sure it will spring up for me in the same fashion you experienced. Thanks, -Jon

                    • 7. Re: Problem with VSAN Health Check Plugin installation
                      Bleeder Hot Shot

                      I noticed that a new version of the VSAN Health plugin was released yesterday. 

                       

                      VMware Virtual SAN Health Check Plug-in 6.0.1 Release Notes

                      • 8. Re: Problem with VSAN Health Check Plugin installation
                        OITVIRT Lurker

                        We had the exact same problem with the VCSA.  I had to add additional read permissions to rui.crt and rui.key, then the button showed up and everything worked.

                         

                        Good luck!
                        Jill

                        • 9. Re: Problem with VSAN Health Check Plugin installation
                          philzy Enthusiast

                          To