Re: vCenter 6.7 vPostGres SQL Crashing vCenter App...

randylawrence42 · ‎04-04-2019

vCenter 6.7.0.21000 Build 11726888

vCenter vPostgres SQL Keeps eating all of the vcenter memory.

CPU and Memory gets pegged and vcenter goes unresponsive and eventually the web services crash.

Screen Shot 2019-04-01 at 9.51.45 AM.png

daphnissov · ‎04-04-2019

Sounds like you need to open an SR here.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

randylawrence42 · ‎04-04-2019

Tell me about it. LOL

sk84 · ‎04-04-2019

The appliance isn't swapping and still has more than 500% IOWAIT? At first glance I would assume that the storage is too slow and therefore the vPostgres database has some problems with queries that cannot be processed fast enough. But it's just a guess. I would also recommend to open a SR: https://my.vmware.com/group/vmware/get-help

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.

randylawrence42 · ‎04-04-2019

I like youre thinking. That is what I thought and I moved the vcenter from a VSAN DataStore to a Datrium DataStore. But the problems keep happening. It is really weird. I am thinking of moving the vcenter to some Tried and True FC storage running on Kaminario. The Datrium has been super fast unless there is some networking issues I am not seeing. I have vRealize Log Insight and Vops and nothing really shows up from the networking side. It seems to be an out of control SQL Process. Like a bad DB Schema or something.

Vijay2027 · ‎04-05-2019

Can you attach latest postgresql log from /var/log/vmware/vpostgres

randylawrence42 · ‎04-05-2019

Good Idea. Looks like a bunch of locks on the db queries holding everything up. If you can show me how to kill the locks using command line that would be fantastic !!

Attached is the postgresql-01.log

randylawrence42 · ‎04-05-2019

It is happening again right now. Started at 850am. Here are the live logs.

Vijay2027 · ‎04-05-2019

Which monitoring tool are you using?

There are 100's of connections to DB:

2019-04-05 17:08:31.617 UTC 5ca78b8f.9110 0 VCDB vc FATAL: remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.618 UTC 5ca78b8f.9111 0 VCDB vc FATAL: remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.618 UTC 5ca78b8f.9112 0 VCDB vc FATAL: remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.619 UTC 5ca78b8f.9113 0 VCDB vc FATAL: remaining connection slots are reserved for non-replication superuser connections

2019-04-05 17:08:31.677 UTC 5ca78b8f.9114 0 VCDB vc FATAL: remaining connection slots are reserved for non-replication superuser connections

Please share output of below command:

cat /storage/db/vpostgres/postgresql.conf | grep -i "max_connections"

netstat -tulnap | grep -i 443 --> And check if there are several connections from specific IP

randylawrence42 · ‎04-05-2019

cat /storage/db/vpostgres/postgresql.conf | grep -i "max_connections"

max_connections = 100 # (change requires restart)

root@vc-irv [ ~ ]# netstat -tulnap | grep -i 443

tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 2390/rhttpproxy

tcp 0 0 0.0.0.0:5443 0.0.0.0:* LISTEN 2548/vsphere-ui.lau

tcp 0 0 0.0.0.0:9443 0.0.0.0:* LISTEN 2547/vsphere-client

tcp 1 0 127.0.0.1:33700 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 0 0 127.0.0.1:47412 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:42442 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:5432 127.0.0.1:44330 CLOSE_WAIT 35050/postgres: vc

tcp 0 0 10.10.98.47:44434 10.10.67.16:9543 ESTABLISHED 1439/liagent

tcp 0 0 127.0.0.1:443 127.0.0.1:47584 ESTABLISHED 2390/rhttpproxy

tcp 0 0 127.0.0.1:443 127.0.0.1:47838 TIME_WAIT -

tcp 0 0 127.0.0.1:47578 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:47032 127.0.0.1:443 CLOSE_WAIT 3844/python

tcp 1 0 127.0.0.1:5432 127.0.0.1:44302 CLOSE_WAIT 35036/postgres: vc

tcp 0 0 127.0.0.1:46702 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:33692 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:33694 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:52374 127.0.0.1:443 CLOSE_WAIT 2548/vsphere-ui.lau

tcp 0 0 127.0.0.1:47460 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:33698 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 0 0 10.10.98.47:443 10.10.98.124:50626 TIME_WAIT -

tcp 1 0 127.0.0.1:60624 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 0 0 127.0.0.1:47628 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:46542 127.0.0.1:443 CLOSE_WAIT 2548/vsphere-ui.lau

tcp 1 0 127.0.0.1:5432 127.0.0.1:44308 CLOSE_WAIT 35042/postgres: vc

tcp 0 0 127.0.0.1:47282 127.0.0.1:443 ESTABLISHED 5366/vmware-sps.lau

tcp 0 0 127.0.0.1:47752 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:5432 127.0.0.1:44306 CLOSE_WAIT 35039/postgres: vc

tcp 1 0 127.0.0.1:55964 127.0.0.1:443 CLOSE_WAIT 3844/python

tcp 1 0 127.0.0.1:60628 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:33716 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:33650 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:57548 127.0.0.1:443 CLOSE_WAIT 5320/python

tcp 0 0 127.0.0.1:443 127.0.0.1:47526 TIME_WAIT -

tcp 0 0 127.0.0.1:47656 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:47582 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:5432 127.0.0.1:44314 CLOSE_WAIT 35044/postgres: vc

tcp 0 0 127.0.0.1:443 127.0.0.1:47832 TIME_WAIT -

tcp 0 0 127.0.0.1:47734 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:443 127.0.0.1:47448 TIME_WAIT -

tcp 1 0 127.0.0.1:5432 127.0.0.1:44340 CLOSE_WAIT 35057/postgres: vc

tcp 1 0 127.0.0.1:33712 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:5432 127.0.0.1:44338 CLOSE_WAIT 35056/postgres: vc

tcp 0 0 127.0.0.1:443 127.0.0.1:41990 TIME_WAIT -

tcp 1 0 127.0.0.1:41142 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:5432 127.0.0.1:44348 CLOSE_WAIT 35060/postgres: vc

tcp 0 0 127.0.0.1:47648 127.0.0.1:443 TIME_WAIT -

tcp 0 0 10.10.98.47:443 10.10.98.124:50622 TIME_WAIT -

tcp 0 0 127.0.0.1:47746 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:47830 127.0.0.1:443 ESTABLISHED 5366/vmware-sps.lau

tcp 0 0 127.0.0.1:47622 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:443 127.0.0.1:47282 ESTABLISHED 2390/rhttpproxy

tcp 0 0 127.0.0.1:47668 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:47674 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:5432 127.0.0.1:44350 CLOSE_WAIT 35063/postgres: vc

tcp 1 0 127.0.0.1:5432 127.0.0.1:44310 CLOSE_WAIT 35043/postgres: vc

tcp 1 0 127.0.0.1:33690 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:5432 127.0.0.1:44334 CLOSE_WAIT 35053/postgres: vc

tcp 0 0 127.0.0.1:47634 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:5432 127.0.0.1:44303 CLOSE_WAIT 52434/postgres: vc

tcp 1 0 127.0.0.1:60406 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:33710 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 0 0 127.0.0.1:443 127.0.0.1:47830 ESTABLISHED 2390/rhttpproxy

tcp 1 0 127.0.0.1:33706 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:42612 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 32 0 10.10.98.47:55568 184.27.114.65:443 CLOSE_WAIT 5319/updatemgr

tcp 0 0 10.10.98.47:443 10.10.98.124:50631 TIME_WAIT -

tcp 1 0 10.10.98.47:34874 208.91.0.89:443 CLOSE_WAIT 2547/vsphere-client

tcp 1 0 127.0.0.1:33714 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 0 0 127.0.0.1:443 127.0.0.1:47722 TIME_WAIT -

tcp 0 0 127.0.0.1:47584 127.0.0.1:443 ESTABLISHED 5320/python

tcp 1 0 127.0.0.1:5432 127.0.0.1:44346 CLOSE_WAIT 35058/postgres: vc

tcp 1 0 127.0.0.1:56164 127.0.0.1:443 CLOSE_WAIT 5348/vmware-vsm.lau

tcp 0 0 127.0.0.1:443 127.0.0.1:47922 TIME_WAIT -

tcp 0 0 127.0.0.1:47740 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:47610 127.0.0.1:443 TIME_WAIT -

tcp 0 0 127.0.0.1:47662 127.0.0.1:443 TIME_WAIT -

tcp 1 0 127.0.0.1:33696 127.0.0.1:443 CLOSE_WAIT 5400/vmware-content

tcp 1 0 127.0.0.1:5432 127.0.0.1:44324 CLOSE_WAIT 35049/postgres: vc

tcp6 0 0 :::443 :::* LISTEN

I am not familiar with this. Thanks again for all the help.

Vijay2027 · ‎04-05-2019

One way to address this issue is to increase max_connections to 250, restart vCSA and monitor.

If this doesn't help contact VMware support.

randylawrence42 · ‎04-05-2019

Can you educate me on what these connections are for ? I know my team does a lot of automation as this is a lab environment.

Vijay2027 · ‎04-05-2019

Usually this comes from the application-level, one of more services connecting to PostgreSQL are visibly not releasing connections, causing the amount of connections to run out.

Attach vpxd and vpxd-profiler logs, we can track the client IP from these logs.

randylawrence42 · ‎04-05-2019

Very Cool. Finally Something that makes sense. I will give it a go and let you know what happens. Cheers. Making Changes now.

randylawrence42 · ‎04-05-2019

Attached is the vpxd-profiler.log

Vijay2027 · ‎04-05-2019

Attach vpxd.log as well.

randylawrence42 · ‎04-05-2019

vpxd log attached

Vijay2027 · ‎04-05-2019

I've sent you a DM.

wreedMH · ‎05-04-2019

Did you ever get this fixed? Mine isnt crashing, but its been nailing CPU.

randylawrence42 · ‎05-04-2019

No we were not able to fix. I do not have vmware support. I would suggest opening a case with vmware support.

All

vCenter 6.7 vPostGres SQL Crashing vCenter Appliance