vCenter

View Only

Back to discussions

Expand all | Collapse all

Should the /psc URL work on both HA load balanced PSC nodes?

Jump to Best Answer

1. Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
stevewalker74
Posted Oct 08, 2018 01:45 PM

Reply Reply Privately
I have run into a strange issue which occurs following the enablement of two PSC 6.5 nodes in an HA configuration as part of a rolling upgrade from 5.5.
The first PSC node in a new site was migrated from an original Window vCenter 5.5 SSO to PSC 6.5, and subsequently a second new node was joined to the first site in order for replication to be established. I'm using a Citrix NetScaler to load balance the configuration and I noticed at some point after the successful HA repointing was done that I am unable to access the https://hosso01.sbcpureconsult.internal/psc URL. The second node, https://hosso2.sbcpureconsult.internal/psc works correctly and redirects to the load balanced address psc-ha-vip.sbcpureconsult.internal for authentication before displaying the PSC client UI. Irrespective of whichever node is selected I am able to log in to vCenter, then choose Administration, System Configuration, select a node then Manage, Settings or CA without receiving any errors.
If I deliberately drop the first node out of the load balancing config on the NetScaler I don't have any issues when accessing the /psc URL by either host name or load balancer name, but if I try to connect to the first node by its own DNS name or IP I get an HTTP 400 error and the following entry in:
/storage/log/vmware/psc-client/psc-client.log
[2018-10-08 12:05:20.347] [ERROR] tomcat-http--3 com.vmware.vsphere.client.security.websso.MetadataGeneratorImpl - Error when creating idp metadata.
java.lang.RuntimeException: java.io.IOException: HTTPS hostname wrong: should be <psc-ha-vip.sbcpureconsult.internal>
It appears that the HTTP 400 error is because the psc-client Tomcat application doesn't start up correctly on the first node anymore, along with an error in..
/storage/log/vmware/rhttpproxy/rhttpproxy.log
2018-10-08T13:27:10.691Z warning rhttpproxy[7FEA4B941700] [Originator@6876 sub=Default] SSL Handshake failed for stream <SSL(<io_obj p:0x00007fea2c098010, h:27, <TCP '192.168.0.117:443'>, <TCP '192.168.0.121:26417'>>)>: N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read)
I've repeated the same steps in my lab environment that I experienced in the customer site and can confirm the same behaviour. Let me explain however that all other vCenter functionality is correct and this issue only affects the /psc URL.
Could this be deemed 'correct' behaviour? If I choose https://psc-ha-vip.sbcpureconsult.internal/psc (which is the load balancer address) I am initially only able to connect if the second node is online and happens to be selected.
I am happy to provide detailed steps of the upgrade process, but in the first case I would like to confirm that it should be possible to access the /psc URL on each node deliberately?
2. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
Vijay2027
Posted Oct 08, 2018 02:26 PM

Reply Reply Privately
To configure 6.5 PSC's in HA mode did you replace machine ssl certificates on both the PSC's?
Ref: VMware Knowledge Base
3. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
stevewalker74
Posted Oct 08, 2018 02:29 PM

Reply Reply Privately
Yes, the certificate on both nodes now contains subject alternate names as follows:
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store MACHINE_SSL_CERT --text
X509v3 Subject Alternative Name:
DNS:hosso01.sbcpureconsult.internal, DNS:hosso02.sbcpureconsult.internal, DNS:psc-ha-vip.sbcpureconsult.internal
4. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
Vijay2027
Posted Oct 08, 2018 02:49 PM

Reply Reply Privately
Hope you used the same certs on both the psc's.
Run the below commands on both PSc's and verify if the endpoints are pointed to LB vip.
# /usr/lib/vmidentity/tools/scripts/lstool.py list --url https://localhost:7080/lookupservice/sdk --site sitename --type cs.license | grep "URL:"
# /usr/lib/vmidentity/tools/scripts/lstool.py list --url https://PSC_FQDN/lookupservice/sdk --site sitename --type cs.identity | grep "URL:"
Get-site name by running the below command:
/usr/lib/vmware-vmafd/bin/vmafd-cli get-site-name --server-name localhost
5. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
stevewalker74
Posted Oct 08, 2018 03:05 PM

Reply Reply Privately
Yes, the certificates were generated on the first node and copied to the second.
The first command you provided (shown on next line) needed to be modified to plain http instead of https otherwise I would get a 'com.vmware.vim.vmomi.client.exception.SslException: javax.net.ssl.SSLException: Server certificate chain not verified (no details)'
/usr/lib/vmidentity/tools/scripts/lstool.py list --url http://localhost:7080/lookupservice/sdk --site Default-First-Site --type cs.license | grep "URL:"
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/sdk
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/ph/sdk
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/healthstatus
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/resourcebundle
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/sdk
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/healthstatus
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/ph/sdk
URL: https://psc-ha-vip.sbcpureconsult.internal:443/ls/resourcebundle
/usr/lib/vmidentity/tools/scripts/lstool.py list --url https://psc-ha-vip.sbcpureconsult.internal/lookupservice/sdk --site Default-First-Site --type cs.identity | grep "URL:"
URL: https://psc-ha-vip.sbcpureconsult.internal/sts/STSService/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/sdk/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/sdk/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/websso/SAML2/Metadata/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/websso/HealthStatus
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/idp
URL: https://psc-ha-vip.sbcpureconsult.internal/openidconnect/vsphere.local/.well-known/openid-configuration
URL: https://psc-ha-vip.sbcpureconsult.internal/idm
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/sdk/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/sdk/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/sso-adminserver/idp
URL: https://psc-ha-vip.sbcpureconsult.internal/sts/STSService/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/websso/HealthStatus
URL: https://psc-ha-vip.sbcpureconsult.internal/openidconnect/vsphere.local/.well-known/openid-configuration
URL: https://psc-ha-vip.sbcpureconsult.internal/websso/SAML2/Metadata/vsphere.local
URL: https://psc-ha-vip.sbcpureconsult.internal/idm
I've repeated both checks on each node, and the results are identical. I appreciate your assistance so far!
6. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
stevewalker74
Posted Oct 08, 2018 03:16 PM

Reply Reply Privately
Here's the text displayed during the HTTP error, accessing /psc on the load balanced node:
HTTP Status 400 – Bad Request
Type Status Report
Message An error occurred while sending an authentication request to the PSC Single Sign-On server - null
Description The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
Apache Tomcat/8.5.13
7. RE: Should the /psc URL work on both HA load balanced PSC nodes?

0 Recommend
stevewalker74
Posted Oct 09, 2018 08:59 AM
| view attached (2)

Reply Reply Privately
Further to the outline above, I have rolled back my lab snapshots and repeated the process of creating new machine certificates and applying them to the PSC nodes. The log files showing the certificate replacement process have been attached and match the process explained by the VMware article quoted above.
Interestingly the number of services updated during the machine cert replacement process are different for each of the two nodes:
Node 1:
Updated 7 service(s)
Status : 100% Completed [All tasks completed successfully]
Node 2:
Updated 10 service(s)
Status : 100% Completed [All tasks completed successfully]
Additionally, at the end of the replacement process you'll see in the logs that the pschealth service does not start correctly on node 1, but can be manually restarted. Following a reboot the psc-client service is stopped also on node 1 but can again be manually restarted. Prior to the certificate replacement process the service was stable. This is the node which was created by running the appliance installer in migration mode from an existing Windows vCenter Server 5.5 SSO instance.
Once the certificate replacement process is complete and the psc-client/pschealth services are started manually I'm able to log in independently to both SSO nodes using the /psc URL. As this point I'm wondering if the psc-client and pschealth services arbitrate between active nodes in a site to decide which one will run the /psc URL?
I will continue with the updateSSOConfig.py and UpdateLsEndpoint.py scripts in the lab to see if once again I can repeat the behaviour explained at the beginning of this post, but it's happened twice now in the lab and customer environments so I think it's reproducible.

Attachment(s)

hosso02_cert_replace.txt.zip 4 KB 1 version

hosso01_cert_replace.txt.zip 4 KB 1 version
8. RE: Should the /psc URL work on both HA load balanced PSC nodes?
Best Answer

0 Recommend
stevewalker74
Posted Oct 09, 2018 04:58 PM

Reply Reply Privately
Good news, I was able to roll back my lab and re-run the updateSSOConfig.py and UpdateLsEndpoint.py scripts - only to find that the /psc URL did indeed load successfully on both nodes with the NetScaler load balancing in place. So at least I know that the correct behaviour is that you should be able to open /psc on both appliances.
By examining my snapshots at different stages I have now been able to identify a difference between the original migration node and the clean appliance:
When you run the updateSSOconfig.py Python script to repoint the SSO URL to the load balanced address it explains that hostname.txt and server.xml were modified:
# python updateSSOConfig.py --lb-fqdn=psc-ha-vip.sbcpureconsult.internal
script version:1.1.0
executing vmafd-cli command
Modifying hostname.txt
modifying server.xml
Executing StopService --all
Executing StartService --all
I was able to locate hostname.txt files (containing the load balancer address) in:
/etc/vmware/service-state/vmidentity/hostname.txt
/etc/vmware-sso/keys/hostname.txt (missing on node 2, but contained the local name on node 1)
/etc/vmware-sso/hostname.txt
but this second hostname file was missing on the second node. Why is this? I guess that it is used transiently during the script execution in order to inject the correct value into the server.xml file.
The server XML file is located in the folder:
/usr/lib/vmware-sso/vmware-sts/conf/server.xml
my faulty node contained the following certificate entries under the connector definition:
..store="STS_INTERNAL_SSL_CERT"
certificateKeystoreFile="STS_INTERNAL_SSL_CERT"..
my working node contained:
..store="MACHINE_SSL_CERT"
certificateKeystoreFile="MACHINE_SSL_CERT"..
So I was able to simply copy the server.xml file from the working node (overwriting the original on the faulty node) and also remove the /etc/vmware-sso/keys/hostname.txt file to match the configuration. Following a reboot my first SSO node now responds correctly by redirecting https://hosso01.sbcpureconsult.internal/psc to https://psc-ha-vip.sbcpureconsult.internal/websso to obtain its SAML token before ultimately displaying the PSC client UI.
As a follow up, by examining the STS_INTERNAL_SSL_CERT store I can see that it was issued by the original Windows vCenter Server 5.5 SSO CA to the subject name:
ssoserver,dc=vsphere,dc=local
This store is not present on the other node, and so the correct load balancing certificate replacement must somehow be omitted by one of the upgrade scripts when this scenario occurs (5.5 SSO to 6.5 PSC).
Hope that helps someone else one day!

vCenter

Should the /psc URL work on both HA load balanced PSC nodes?

stevewalker74Oct 08, 2018 01:45 PM

Vijay2027Oct 08, 2018 02:26 PM

stevewalker74Oct 08, 2018 02:29 PM

Vijay2027Oct 08, 2018 02:49 PM

stevewalker74Oct 08, 2018 03:05 PM

stevewalker74Oct 08, 2018 03:16 PM

stevewalker74Oct 09, 2018 08:59 AM

stevewalker74Oct 09, 2018 04:58 PMBest Answer

1. Should the /psc URL work on both HA load balanced PSC nodes?

2. RE: Should the /psc URL work on both HA load balanced PSC nodes?

3. RE: Should the /psc URL work on both HA load balanced PSC nodes?

4. RE: Should the /psc URL work on both HA load balanced PSC nodes?

5. RE: Should the /psc URL work on both HA load balanced PSC nodes?

6. RE: Should the /psc URL work on both HA load balanced PSC nodes?

7. RE: Should the /psc URL work on both HA load balanced PSC nodes?

8. RE: Should the /psc URL work on both HA load balanced PSC nodes? Best Answer

8. RE: Should the /psc URL work on both HA load balanced PSC nodes?
Best Answer