VMware Cloud Community
zszalai
Contributor
Contributor

hbrsrv daemon on VRMS appliance fails to start - solved

I publish this problem and its resolution in a hope that it will help others. A few months after the installation the HBR component of the SRM partly failed. Our setup has one VRMS and two VRS appliances, and a few dozens of replicated VMs. One third of the replications stuck and it turned out that all of them were handled by the VRMS appliance at the destination site.

The source ESXi hosts logged a lot of refused connections to the VRMS appliance. Because of this I suspected firewall misconfiguration, but tcpdump proved that TCP requests arrive to the appliance and refused by it. Finally it turned out that the hbrsrv daemon doesn't run on the VRMS so TCP ports 34031 andd 44046 are closed.

I tried to start hbrsrv manually on the appliance with the "/etc/init.d/hbrsrv restart" command, no success. However, the appliance logged the following lines to the /var/log/messages:

Sep 17 13:34:42 xxxxx watchdog-hbrsrv: [8818] Begin 'cgexec -g memory:/hbrsrv /usr/bin/hbrsrv --daemon --pidfile /var/run/vmware/hbrsrv.pid --vmodlport 8123 --lwdport 31031,44046', min-uptime = 60, max-quick-failures = 5, max-total-failures = 1000000, bg_pid_file = '/var/run/vmware/hbrsrv.pid'

Sep 17 13:34:42 xxxxx watchdog-hbrsrv: [8818] Executing 'cgexec -g memory:/hbrsrv /usr/bin/hbrsrv --daemon --pidfile /var/run/vmware/hbrsrv.pid --vmodlport 8123 --lwdport 31031,44046'

Sep 17 13:34:42 xxxxx su: FAILED SU (to hbrsrv) root on none

And the last line was the key to the solution. I checked the hbrsrv account:

xxxxx:/var/log # chage -l hbrsrv

Minimum:        1

Maximum:        90

Warning:        7

Inactive:       -1

Last Change:            Jun 05, 2013

Password Expires:       Sep 03, 2013

Password Inactive:      Never

Account Expires:        Never

Yes, the account expired. I changed the account to never-expire with "chage hbrsrv", rebooted the VRMS appliance and all stuck replication started to work. Just to be on the safe side I checked the same account on the VRS appliances but both of them was set to Never expire.

Is there anybody here with similar experiences? Our VRMS is version 5.1.1.0 Build 1079383

0 Kudos
1 Reply
paradise1967
Enthusiast
Enthusiast

Hi

I had the same thing last week with the VRMS appliances. Like you I was looking at the network to see where the replication ports were being blocked. After reading this thread Re: Replication Server Error:127.0.0.1:8123 I came across yours which pointed me in the right direction. Thanks Smiley Happy

I also found that the self-signed SSL certificates had expired, so after re-generating new certificates (which default to only one year) I thought I was fixed. Not yet.

In VAMI, I restarted the VRM service, which updated the SSL thumbprint registry for each host and replication started again.

Cheers

Kevin

0 Kudos