VMware Cloud Community
Meathook
Enthusiast
Enthusiast

VCSA 6.7: Postgres Archiver Service stopped & PSC Health/vCenter Server "degraded"

Hi Everyone,

I'm currently setting up a new environment using ESXi 6.7 with an vCenter appliance deployed on the host. Setting up the hsot and the vCenter worked pretty well, and so far everything seems to be working alright, but I noticed some strange behaviour regarding the services of the vCenter:

VMware Postgres Archiver:

This service seems to change to status "stopped" every once in a while. I searched the KB and found a tip to change the wal_sender_timeout parameter to 600s, so the service doesn't run into any timeouts, but sadly this didn't help. I tried setting wal_sender_timeout to 0 (disabling the timeout), bu the issue still persists.

VMware Knowledge Base

My logfiles present the same Messages as described in this KB-article:

/var/log/vmware/vpostgres/pg_archiver.log-[n].stderr:

ERROR  pg_archiver could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally

        before or while processing the request.

/var/log/vmware/vpostgres/postgresql-[nn].log:

[unknown] archiver LOG:  terminating walsender process due to replication timeout

wal.png

VMware PSC Health & VMware vCenter Server:

Both of these services currently show up as "degraded":

services.jpg

Meanwhile the summary page reports everything as "good":

good.jpg

The appliance has been assigned 16GB of RAM and 2vCPUs. Current Version:

version.jpg

Any ideas how to fix these services, or where to check for further information about the status of the PCS Service and the vCenter Server?

With regards,

Fabian

Reply
0 Kudos
9 Replies
Meathook
Enthusiast
Enthusiast

Quick update, it seems like the services PSC Health and vCenter Server both switched back to "Healthy", but the Postgres Archiver changed to stopped again. Very curious...

Reply
0 Kudos
suhaakin
Contributor
Contributor

Click below for resolution

VMware Knowledge Base

Reply
0 Kudos
josh0875
Contributor
Contributor

We have vCenter Appliance 6.7 Update 2a (6.7.0.31000) and confirmed that wal_sender_timeout settings are already set to 600s as default during the installation. Also, upgraded to vCenter Appliance 6.7 Update 3 (6.7.0.40000) but Postgres Archiver service keep shutting down.

Reply
0 Kudos
PaulAdamFeB
Contributor
Contributor

Yep, exactly the same issue here - running latest VCSA and still getting the postgres archiver service stopping (confirmed that the timeout is set to 600ms in the config with this version of the VCSA).

However, I do also have a second VCSA running in our DR site with exact same config (no VM's other than SRM placeholders and the VCSA, SRM and VRA appliances) and it does not show the same issues?!?! - Go figure?

If I get time, I will study logs and report back any findings.

EDIT (Still had an open SSH session to the VCSA):

2019-09-17T21:35:21.947Z ERROR  pg_archiver could not receive data from WAL stream: server closed the connection unexpectedly

        This probably means the server terminated abnormally

        before or while processing the request.

Exactly the same error then!

Reply
0 Kudos
ericdamwhite37
Contributor
Contributor

Has this been resolved? I'm getting the same error?

Eric

eridamwhite@gmail.com

Reply
0 Kudos
scott28tt
VMware Employee
VMware Employee

Moderator: Moved to vCenter Server


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
Reply
0 Kudos
SvenVanRoeyen
Contributor
Contributor

Any update on this issue ? I am getting this Postgres Archive Service Health Alarm as well after installation of the latest vCenter updates. No special configuration of vCenter / No backups configured.

The appliance management console does not report problems but the vCenter itself always generates the alarm.

Reply
0 Kudos
JPRasquin
Contributor
Contributor

Hi,

(Note : I'm still new to vcenter)

I'm facing the same issue, and I also checked the KB article, and also found wal_sender_timeout settings to be already set to 600s and still this issue arises.

Drawback is that scheduled backups fail due to this.

Is there any update maybe on this issue ?

Regards, JP.

PS: I'm on 6.7.0.44100.

Reply
0 Kudos
Ajay1988
Expert
Expert

This is already fixed in latest 6.7 versions .  The wal_sender_timeout   is already at 600s . 

Please check for intermittent time sync issues  . Check /var/log/messages for "Time has been changed" . If this log is there then u have a time sync issue which is causing archiver service issues.

Also /var/core will have   pschealthd core dumps if u have time sync issues

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
Reply
0 Kudos