VMware Cloud Community
daphnissov
Immortal
Immortal
Jump to solution

Gaining health insights into PSC replication partners

In vSphere 6.5, there can be multiple PSCs replicating among themselves. Although vCenter can be pointed to a single PSC, there can be a replication agreement among it and other peers. For example, if a vCenter is pointed at a single PSC with a separate PSC in the same site and SSO domain to which it is being replicated, both PSCs show up in the System Configuration portion of the Flex client.

pastedImage_0.png

In this case, vc-01 is pointed to psc-02. psc-02 is replicating with psc-01 and their status is good. Both PSCs are in the same domain and same site.

root@psc-01 [ /usr/lib/vmware-vmdir/bin ]# ./vdcrepadmin -f showpartnerstatus -h localhost -u administrator -w VMware1!

Partner: psc-02.domain.com

Host available:   Yes

Status available: Yes

My last change number:             1525

Partner has seen my change number: 1525

Partner is 0 changes behind.

The question becomes how does one gain health insight into the replica partners? If these PSCs were behind a load balancer, the pool would show one of the nodes down in the case where either the entire appliance failed or some of the services failed. But in the case where there is no load balancer, vCenter does not seem to alert on critical health changes of the replica. For example, if I were to stop vmafdd, replication between the PSCs would then fail and you would see the following when interrogating the current PSC.

root@psc-02 [ /usr/lib/vmware-vmdir/bin ]# ./vdcrepadmin -f showpartnerstatus -h localhost -u administrator -w VMware1!

Partner: psc-01.domain.com

Host available:   No

If one of the critical services gets stopped, the health status would also change in the Flex client listing one or more services as critical, yet no alarm would be triggered. The only alarm definition that seems to correspond is the PSC Service Health Alarm which is by default defined as follows:

pastedImage_4.png

It seems even when stopping pschealth on the replica partner this alarm is not triggered. Adding additional OR conditions for other service names (service name equals component ID, I'm guessing) does not seem to trigger the alarm.

Has anyone been successful in gaining some sort of health status insight through vCenter (or any other tool that doesn't involve manual scripting) for the PSC replicas? The main use case is where the architecture doesn't allow for or someone does not have a load balancer and the status of the PSC replica is unknown until it comes time to repoint. If the status has been in a failed state, a repoint would fail and vCenter would potentially be stuck in an unavailable state. That obviously is something that needs to be mitigated in a design, but there don't appear to be any mechanisms that allow one to understand this health relationship and be alerted on it if it falls into a degraded state.

0 Kudos
1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal
Jump to solution

Since there are really no solutions for monitoring of replication, I decided to create my own using Log Insight and vROps. I write about the problem and show how to create an alert from relevant log entries, then forward that to vROps. For anyone interested, the post is here Detecting PSC Replication Failure with Log Insight and I also include the vRLI alert that can be easily imported into your own environment.

View solution in original post

0 Kudos
2 Replies
daphnissov
Immortal
Immortal
Jump to solution

Just to also state that I am familiar with William Lam's script available in his blog post here, but this is both unsupported and not an ideal solution for a variety of reasons. My preference would be to somehow make either the aforementioned vCenter alert function properly, or design a new one that does, or, secondarily, use vROps to understand the health state.

0 Kudos
daphnissov
Immortal
Immortal
Jump to solution

Since there are really no solutions for monitoring of replication, I decided to create my own using Log Insight and vROps. I write about the problem and show how to create an alert from relevant log entries, then forward that to vROps. For anyone interested, the post is here Detecting PSC Replication Failure with Log Insight and I also include the vRLI alert that can be easily imported into your own environment.

0 Kudos