1 2 Previous Next 16 Replies Latest reply on Sep 29, 2020 5:49 AM by JulietDeltaGolf

    Permissions problems with linked VCs after VCSA migration

    JulietDeltaGolf Novice

      Hi,

       

      We have just migrated/upgraded 2, linked mode, Windows 6.0 vCenters to 6.7 VCSAs. Everything seemed to go OK during the migrations but now they are complete there seems to be some problem with the trust or sharing of credentials between the 2 VCSAs. They still see each other as linked and VCSA 'A' is visible at the top level from VCSA 'B' and vice-versa. However attempting to expand the tree of the opposing site from either A or B gives the 'You have no priviliges to view this object or it does not exist' error message. This occours if we are logged in to either site and attempting to view the other and we get the same problem if we are logged in as 'administrator@vsphere.local' or if we are logged in as any AD account that should have the permissions to do this.

       

      I.e. AD account 'X' is able to log in to both VCSA 'A' and VCSA 'B' and can explore the 'local' VCSA in full but can't view 'A' from 'B' and can't view 'B' from 'A'.

       

      We can however see recent tasks intitated at either site, e.g. if we delete a VM snapshot at site 'A' (using VCSA 'A') and site 'B' (using VCSA 'B') then the recent tasks pane is populate with both tasks and in both VCSAs.

        • 1. Re: Permissions problems with linked VCs after VCSA migration
          scott28tt Guru
          User ModeratorsVMware EmployeesCommunity Warriors

          Moderator: Thread moved to the vCenter Server area.

          • 2. Re: Permissions problems with linked VCs after VCSA migration
            Lalegre Expert

            Hey JulietDeltaGolf,

             

            Check if the time between the appliances is the same on all PSCs and vCenters (Depends if you have Embedded or External deployment).

             

            Also check if the replication between PSCs is working:

             

            1. vdcrepadmin -f showservers -h PSC_FQDN -u administrator -w Administrator_Password

            2. vdcrepadmin -f showpartners -h PSC_FQDN -u administrator -w Administrator_Password

            3. vdcrepadmin -f showpartnerstatus -h localhost -u administrator -w Administrator_Password

             

            You should see the remote PSC as connected partner and also the sync should be seen. I will past also the KB where all the checks are and the expected outputs in case you are not familiar with the commands: VMware Knowledge Base

            • 3. Re: Permissions problems with linked VCs after VCSA migration
              JulietDeltaGolf Novice

              Thanks but things have changed (seemingly for the worse but perhaps it actually helps point us in the right direction now). Site A seems to still be fine but attempting to log in to site B VC, using either a domain user or the administrator@vsphere.local account now produces the same error "Empty SSO respone string.":

              We have just restarted the site B VCSA, in case this allows us to re-connect.

              • 4. Re: Permissions problems with linked VCs after VCSA migration
                JulietDeltaGolf Novice

                The VCSA reboot has resolved the 500 problem, we're back to the same behaviour as per the original post.

                Both VCSAs are set to host sync'd time and appear to be the same.

                • 5. Re: Permissions problems with linked VCs after VCSA migration
                  Lalegre Expert

                  However you should check what i told you because it checks the communication and syncing of the SSO Domain between the PSCs and this could be your issue of why you are not able to connect.

                   

                  Also you should see the logs of the SSO service.

                  • 6. Re: Permissions problems with linked VCs after VCSA migration
                    JulietDeltaGolf Novice

                    For site A we see this:

                     

                    ./vdcrepadmin -f showpartnerstatus -h localhost -u administrator

                    Partner: site_A

                    Host available:   Yes

                    Status available: Yes

                    My last change number:             105292

                    Partner has seen my change number: 88906

                    Partner is 16386 changes behind.

                     

                    But for site B we see this:

                     

                    ./vdcrepadmin -f showpartnerstatus -h localhost -u administrator

                    Partner: site_A

                    Host available:   Yes

                    Status available: No

                     

                    Other than that all the other 'showservers' and 'showpartners' commands appear to give correct results. It looks like site B is the problem, it is not seeing the changes from site A or sending its changes to site A.

                    • 7. Re: Permissions problems with linked VCs after VCSA migration
                      Lalegre Expert

                      So basically is that, the agreement between the two PSCs is broken. So i would recommend you to go over the "createagreement" section from the KB i sent you first to recreate it. Do it from the PSC on Site B.

                      • 8. Re: Permissions problems with linked VCs after VCSA migration
                        JulietDeltaGolf Novice

                        Not sure if it changes anything or just confirms but in the logs we see:

                         

                        Site A:

                         

                        2020-09-14T10:37:08.530672+00:00 err vmdird  t@140406020888320: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

                        2020-09-14T10:37:08.531184+00:00 err vmdird  t@140406020888320: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (128.33.45.35)

                        2020-09-14T10:37:08.531493+00:00 err vmdird  t@140406020888320: Bind Request Failed (128.33.45.35) error 49: Protocol version: 3, Bind DN: "cn=SITE_B_FQDN,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

                        2020-09-14T10:37:33.966230+00:00 err vmdird  t@140406750693120: VmDirSafeLDAPBind to (ldap://SITE_B.FQDN:389) failed. SRP(9234)

                         

                        and on Site B:

                         

                        2020-09-14T10:38:33.973211+00:00 err vmdird  t@140126311143168: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

                        2020-09-14T10:38:33.973810+00:00 err vmdird  t@140126311143168: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (10.172.252.16)

                        2020-09-14T10:38:33.974271+00:00 err vmdird  t@140126311143168: Bind Request Failed (10.172.252.16) error 49: Protocol version: 3, Bind DN: "cn=SITE_B_FQDN,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

                        2020-09-14T10:38:38.593819+00:00 err vmdird  t@140127267452672: VmDirSafeLDAPBind to (ldap://SITE_A.FQDN:389) failed. SRP(9234)

                         

                        That sounded a bit like this: VMware Knowledge Base (not quite but similar), so we followed those steps but it doesn't seem to have made any difference.

                         

                        Do you think we should just try and 'createagreement' or should we 'removeagreement' and then' createagreement'?

                        • 9. Re: Permissions problems with linked VCs after VCSA migration
                          Lalegre Expert

                          i would go with that to give it a try cause clearly your replication is not working and maybe the agreement got lost during the ugprade. However take an Snapshot of all the PSCs and vCenters before going on with the procedure.

                          • 10. Re: Permissions problems with linked VCs after VCSA migration
                            JulietDeltaGolf Novice

                            We tried this, all the steps seemed to work (removing and then creating) but the end result is the same:

                             

                            Site A still says:

                             

                            2020-09-14T13:11:36.780423+00:00 err vmdird  t@140484043319040: VmDirSafeLDAPBind to (ldap://site_b_fqdn:389) failed. SRP(9234)

                            2020-09-14T13:12:04.211459+00:00 err vmdird  t@140483967784704: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

                            2020-09-14T13:12:04.327158+00:00 err vmdird  t@140483967784704: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (128.33.45.35)

                            2020-09-14T13:12:04.327454+00:00 err vmdird  t@140483967784704: Bind Request Failed (128.33.45.35) error 49: Protocol version: 3, Bind DN: "cn=site_b_fqdn,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

                             

                            Site B still says:

                             

                            2020-09-14T13:07:03.814559+00:00 err vmdird  t@139719966971648: VmDirSafeLDAPBind to (ldap://site_a_fqdn:389) failed. SRP(9234)

                            2020-09-14T13:07:05.284057+00:00 err vmdird  t@139719111341824: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

                            2020-09-14T13:07:05.284427+00:00 err vmdird  t@139719111341824: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (10.172.252.16)

                            2020-09-14T13:07:05.284618+00:00 err vmdird  t@139719111341824: Bind Request Failed (10.172.252.16) error 49: Protocol version: 3, Bind DN: "cn=site_a_fqdn,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

                            • 11. Re: Permissions problems with linked VCs after VCSA migration
                              Lalegre Expert

                              Well it was good to give it a try. Now try to follow the next procedure: https://vstack.it/2020/03/10/400-an-error-occurred-while-processing-the-authentication-response-from-the-vcenter-single-…

                               

                              It faces exactly the same errors as you so i think it can help.

                              • 12. Re: Permissions problems with linked VCs after VCSA migration
                                JulietDeltaGolf Novice

                                We have moved things forward (using various machine account reset procedures, including this one: VMware Knowledge Base ) and now Site B is able to browse Site A. However it seems the root problem was with Site B as site A still can not browse this.

                                 

                                Site A now reports:

                                 

                                2020-09-14T16:03:32.473567+00:00 err vmdird  t@139750677653248: _VmDirFetchReplicationPage: error: 53 filter: 'uSNChanged>=90807' requested: 1000 received: 0 usn: 90806 utd: '765c1341-c05f-11e5-ae51-000c29865313:95793,'

                                 

                                Which isn't very interesting, however site B now says this:

                                 

                                2020-09-14T15:59:31.962756+00:00 err vmdird  t@140568298514176: VmDirSendLdapResult: Request (Search), Error (53), Message (Server in not in normal mode, not allowing outward replication.), (0) socket (ip.ip.ip.ip)

                                 

                                A bit more googling around led us to this command and output:

                                 

                                # /usr/lib/vmware-vmafd/bin/dir-cli state get

                                Enter password for administrator@vsphere.local:

                                Directory Server State: Failure (5)

                                 

                                There are a couple of articles around this but just attempting to change the state but this CLI output implies this is no longer available with the 6.7 VCSA:

                                 

                                # /usr/lib/vmware-vmafd/bin/dir-cli state set --state NORMAL
                                  Enter password for administrator@vsphere.local:
                                  dir-cli failed. Error 9001: Possible errors:
                                  LDAP error: Operations error
                                  Win Error: Operation failed with error ERROR_INVALID_FUNCTION (1)

                                 

                                and trying to follow other options to reset the password doesn't seem to work either:

                                 

                                # vcenter-restore -u administrator
                                  Please enter SSO Admin Password:
                                  Restore of embedded node is not supported via this script. Exiting.

                                 

                                This seemed to work as a script but didn't resolve the problem:

                                 

                                /usr/lib/vmware-vmafd/bin/dir-cli computer password-reset --login administrator --live-dc-hostname fbsshefvc.fletchers.corp --password XXXXXXX

                                 

                                It feels like we are at the root cause now though, there is a problem with the PSC or 'Directory Server' on Site B.

                                • 13. Re: Permissions problems with linked VCs after VCSA migration
                                  Lalegre Expert

                                  If you check inside the vmdird do you see any issues? Have you also tried to restart the PSCs? The issues related with the replication can be really tough so at this stage i recommend you to open a ticket with VMware GSS.

                                   

                                  Also if you cannot find the fix for the directory service maybe something you can do is deploy a new PSC, join it to the same SSO Domain pointing to Site A and then do the repoint of vCenter Server.

                                  • 14. Re: Permissions problems with linked VCs after VCSA migration
                                    JulietDeltaGolf Novice

                                    This is the embeded PSC in the VCSA.

                                     

                                    We raised it with support, they tried some magic scripts that were supposed to:

                                     

                                    1: Change domain state of broken PSC to 0

                                    2: Decomission broken PSC from the healthy PSC

                                    3: Re-join the domain using the data.mdb of a healthy PSC (and therefore fix replication)

                                     

                                    But they failed pretty hard and we had to revert everything to snapshots again, worried the next suggestion is just going to be 'deploy a new VCSA'.

                                    1 2 Previous Next