VMware Cloud Community
randylawrence42
Contributor
Contributor

vCenter 6.7 vPostGres SQL Crashing vCenter Appliance

vCenter 6.7.0.21000 Build 11726888

vCenter vPostgres SQL Keeps eating all of the vcenter memory.

CPU and Memory gets pegged and vcenter goes unresponsive and eventually the web services crash.

Screen Shot 2019-04-01 at 9.51.45 AM.png

Tags (1)
Reply
0 Kudos
23 Replies
daphnissov
Immortal
Immortal

Sounds like you need to open an SR here.

Reply
0 Kudos
randylawrence42
Contributor
Contributor

Tell me about it.   LOL

Reply
0 Kudos
sk84
Expert
Expert

The appliance isn't swapping and still has more than 500% IOWAIT? At first glance I would assume that the storage is too slow and therefore the vPostgres database has some problems with queries that cannot be processed fast enough. But it's just a guess. I would also recommend to open a SR: https://my.vmware.com/group/vmware/get-help

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
Reply
0 Kudos
randylawrence42
Contributor
Contributor

I like youre thinking.  That is what I thought and I moved the vcenter from a VSAN DataStore to a Datrium DataStore.  But the problems keep happening.  It is really weird.  I am thinking of moving the vcenter to some Tried and True FC storage running on Kaminario.   The Datrium has been super fast unless there is some networking issues I am not seeing.  I have vRealize Log Insight and Vops and nothing really shows up from the networking side.  It seems to be an out of control SQL Process.  Like a bad DB Schema or something.

Reply
0 Kudos
Vijay2027
Expert
Expert

Can you attach latest postgresql log from /var/log/vmware/vpostgres

Reply
0 Kudos
randylawrence42
Contributor
Contributor

Good Idea.  Looks like a bunch of locks on the db queries holding everything up.  If you can show me how to kill the locks using command line that would be fantastic !!

Attached is the postgresql-01.log

Reply
0 Kudos
randylawrence42
Contributor
Contributor

It is happening again right now.  Started at 850am.  Here are the live logs.

Reply
0 Kudos
Vijay2027
Expert
Expert

Which monitoring tool are you using?

There are 100's of connections to DB:

2019-04-05 17:08:31.617 UTC 5ca78b8f.9110 0 VCDB vc FATAL:  remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.618 UTC 5ca78b8f.9111 0 VCDB vc FATAL:  remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.618 UTC 5ca78b8f.9112 0 VCDB vc FATAL:  remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.619 UTC 5ca78b8f.9113 0 VCDB vc FATAL:  remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.677 UTC 5ca78b8f.9114 0 VCDB vc FATAL:  remaining connection slots are reserved for non-replication superuser connections

Please share output of below command:

cat /storage/db/vpostgres/postgresql.conf | grep -i "max_connections"

netstat -tulnap | grep -i 443 --> And check if there are several connections from specific IP

Reply
0 Kudos
randylawrence42
Contributor
Contributor

cat /storage/db/vpostgres/postgresql.conf | grep -i "max_connections"

max_connections = 100 # (change requires restart)

root@vc-irv [ ~ ]# netstat -tulnap | grep -i 443

tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      2390/rhttpproxy

tcp        0      0 0.0.0.0:5443            0.0.0.0:*               LISTEN      2548/vsphere-ui.lau

tcp        0      0 0.0.0.0:9443            0.0.0.0:*               LISTEN      2547/vsphere-client

tcp        1      0 127.0.0.1:33700         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        0      0 127.0.0.1:47412         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:42442         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:5432          127.0.0.1:44330         CLOSE_WAIT  35050/postgres: vc

tcp        0      0 10.10.98.47:44434       10.10.67.16:9543        ESTABLISHED 1439/liagent   

tcp        0      0 127.0.0.1:443           127.0.0.1:47584         ESTABLISHED 2390/rhttpproxy

tcp        0      0 127.0.0.1:443           127.0.0.1:47838         TIME_WAIT   -              

tcp        0      0 127.0.0.1:47578         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:47032         127.0.0.1:443           CLOSE_WAIT  3844/python    

tcp        1      0 127.0.0.1:5432          127.0.0.1:44302         CLOSE_WAIT  35036/postgres: vc

tcp        0      0 127.0.0.1:46702         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:33692         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:33694         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:52374         127.0.0.1:443           CLOSE_WAIT  2548/vsphere-ui.lau

tcp        0      0 127.0.0.1:47460         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:33698         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        0      0 10.10.98.47:443         10.10.98.124:50626      TIME_WAIT   -              

tcp        1      0 127.0.0.1:60624         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        0      0 127.0.0.1:47628         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:46542         127.0.0.1:443           CLOSE_WAIT  2548/vsphere-ui.lau

tcp        1      0 127.0.0.1:5432          127.0.0.1:44308         CLOSE_WAIT  35042/postgres: vc

tcp        0      0 127.0.0.1:47282         127.0.0.1:443           ESTABLISHED 5366/vmware-sps.lau

tcp        0      0 127.0.0.1:47752         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:5432          127.0.0.1:44306         CLOSE_WAIT  35039/postgres: vc

tcp        1      0 127.0.0.1:55964         127.0.0.1:443           CLOSE_WAIT  3844/python    

tcp        1      0 127.0.0.1:60628         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:33716         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:33650         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:57548         127.0.0.1:443           CLOSE_WAIT  5320/python    

tcp        0      0 127.0.0.1:443           127.0.0.1:47526         TIME_WAIT   -              

tcp        0      0 127.0.0.1:47656         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:47582         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:5432          127.0.0.1:44314         CLOSE_WAIT  35044/postgres: vc

tcp        0      0 127.0.0.1:443           127.0.0.1:47832         TIME_WAIT   -              

tcp        0      0 127.0.0.1:47734         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:443           127.0.0.1:47448         TIME_WAIT   -              

tcp        1      0 127.0.0.1:5432          127.0.0.1:44340         CLOSE_WAIT  35057/postgres: vc

tcp        1      0 127.0.0.1:33712         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:5432          127.0.0.1:44338         CLOSE_WAIT  35056/postgres: vc

tcp        0      0 127.0.0.1:443           127.0.0.1:41990         TIME_WAIT   -              

tcp        1      0 127.0.0.1:41142         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:5432          127.0.0.1:44348         CLOSE_WAIT  35060/postgres: vc

tcp        0      0 127.0.0.1:47648         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 10.10.98.47:443         10.10.98.124:50622      TIME_WAIT   -              

tcp        0      0 127.0.0.1:47746         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:47830         127.0.0.1:443           ESTABLISHED 5366/vmware-sps.lau

tcp        0      0 127.0.0.1:47622         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:443           127.0.0.1:47282         ESTABLISHED 2390/rhttpproxy

tcp        0      0 127.0.0.1:47668         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:47674         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:5432          127.0.0.1:44350         CLOSE_WAIT  35063/postgres: vc

tcp        1      0 127.0.0.1:5432          127.0.0.1:44310         CLOSE_WAIT  35043/postgres: vc

tcp        1      0 127.0.0.1:33690         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:5432          127.0.0.1:44334         CLOSE_WAIT  35053/postgres: vc

tcp        0      0 127.0.0.1:47634         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:5432          127.0.0.1:44303         CLOSE_WAIT  52434/postgres: vc

tcp        1      0 127.0.0.1:60406         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:33710         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        0      0 127.0.0.1:443           127.0.0.1:47830         ESTABLISHED 2390/rhttpproxy

tcp        1      0 127.0.0.1:33706         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:42612         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp       32      0 10.10.98.47:55568       184.27.114.65:443       CLOSE_WAIT  5319/updatemgr

tcp        0      0 10.10.98.47:443         10.10.98.124:50631      TIME_WAIT   -              

tcp        1      0 10.10.98.47:34874       208.91.0.89:443         CLOSE_WAIT  2547/vsphere-client

tcp        1      0 127.0.0.1:33714         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        0      0 127.0.0.1:443           127.0.0.1:47722         TIME_WAIT   -              

tcp        0      0 127.0.0.1:47584         127.0.0.1:443           ESTABLISHED 5320/python    

tcp        1      0 127.0.0.1:5432          127.0.0.1:44346         CLOSE_WAIT  35058/postgres: vc

tcp        1      0 127.0.0.1:56164         127.0.0.1:443           CLOSE_WAIT  5348/vmware-vsm.lau

tcp        0      0 127.0.0.1:443           127.0.0.1:47922         TIME_WAIT   -              

tcp        0      0 127.0.0.1:47740         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:47610         127.0.0.1:443           TIME_WAIT   -              

tcp        0      0 127.0.0.1:47662         127.0.0.1:443           TIME_WAIT   -              

tcp        1      0 127.0.0.1:33696         127.0.0.1:443           CLOSE_WAIT  5400/vmware-content

tcp        1      0 127.0.0.1:5432          127.0.0.1:44324         CLOSE_WAIT  35049/postgres: vc

tcp6       0      0 :::443                  :::*                    LISTEN  

I am not familiar with this. Thanks again for all the help. 

Reply
0 Kudos
Vijay2027
Expert
Expert

One way to address this issue is to increase max_connections to 250, restart vCSA and monitor.

If this doesn't help contact VMware support.

Reply
0 Kudos
randylawrence42
Contributor
Contributor

Can you educate me on what these connections are for ?  I know my team does a lot of automation as this is a lab environment.

Reply
0 Kudos
Vijay2027
Expert
Expert

Usually this comes from the application-level, one of more services connecting to PostgreSQL are visibly not releasing connections, causing the amount of connections to run out.

Attach vpxd and vpxd-profiler logs, we can track the client IP from these logs.

Reply
0 Kudos
randylawrence42
Contributor
Contributor

Very Cool.  Finally Something that makes sense.  I will give it a go and let you know what happens.  Cheers.  Making Changes now.

Reply
0 Kudos
randylawrence42
Contributor
Contributor

Attached is the vpxd-profiler.log

Reply
0 Kudos
Vijay2027
Expert
Expert

Attach vpxd.log as well.

Reply
0 Kudos
randylawrence42
Contributor
Contributor

vpxd log attached

Reply
0 Kudos
Vijay2027
Expert
Expert

I've sent you a DM.

Reply
0 Kudos
wreedMH
Hot Shot
Hot Shot

Did you ever get this fixed? Mine isnt crashing, but its been nailing CPU.

pastedImage_0.png

Reply
0 Kudos
randylawrence42
Contributor
Contributor

No we were not able to fix.  I do not have vmware support.  I would suggest opening a case with vmware support.

Reply
0 Kudos