We are running a vCenter appliance version 5.5 2d. We recently updated to this version in an attempt to resolve our problems with vCenter crashing. Occasionally usually every 1 - 2 days vCenter will crash. When it crashes a dump file is created (core.vpxd-worker) in the /storage/core folder.
I have a support case open with VMWare and so far I don't have any resolution to why my vCenter is crashing so frequently.
I did notice while picking through the vpxd.log after the latest crash that there are out of memory errors with postgresql that happened right before the crash. I believe this may be the culprit. Anyone have any ideas on how to resolve these errors?
Our vCenter appliance has 24GB of memory and hardly ever goes beyond 25% usage, same with the CPU. I'm going to try running a manual vacuum command outlined here but other than that I'm all out of ideas on what to try.
2015-02-07T02:00:05.656Z [7F717AEDD700 warning 'Default' opID=SWI-3f804862] [VdbStatement] SQL execution failed: vacuum (analyze) vpx_hist_stat1_227
2015-02-07T02:00:05.656Z [7F717AEDD700 warning 'Default' opID=SWI-3f804862] [VdbStatement] Execution elapsed time: 617 ms
2015-02-07T02:00:05.656Z [7F717AEDD700 warning 'Default' opID=SWI-3f804862] [VdbStatement] Diagnostic data from driver is 53200:1:7:ERROR: out of memory
--> Failed on request of size 71.;
--> Error while executing the query
2015-02-07T02:00:05.656Z [7F7136244700 info 'commonvpxLro' opID=1852ed36] [VpxLRO] -- BEGIN task-internal-54802710 -- -- vmodl.query.PropertyCollector.retrieveContents -- 40675e8b-09b0-7864-8924-58652fc267da(525eca50-84c7-4d37-746a-92b7f9e93dae)
2015-02-07T02:00:05.657Z [7F719F5B8700 info 'commonvpxLro' opID=55c1e2d6] [VpxLRO] -- BEGIN task-internal-54802713 -- -- vim.SessionManager.sessionIsActive -- 81106db1-3765-372b-d642-b669c3f2f54e(52fdfe72-ff88-eb87-d25c-3319b0c665f0)
2015-02-07T02:00:05.657Z [7F7136244700 info 'commonvpxLro' opID=1852ed36] [VpxLRO] -- FINISH task-internal-54802710 -- -- vmodl.query.PropertyCollector.retrieveContents --
2015-02-07T02:00:05.656Z [7F717AEDD700 warning 'Default' opID=SWI-3f804862] [VdbStatement] Bind parameters:
2015-02-07T02:00:05.657Z [7F719F5B8700 info 'commonvpxLro' opID=55c1e2d6] [VpxLRO] -- FINISH task-internal-54802713 -- -- vim.SessionManager.sessionIsActive --
2015-02-07T02:00:05.658Z [7F719C5D9700 info 'commonvpxLro' opID=2d146636] [VpxLRO] -- BEGIN task-internal-54802714 -- -- vmodl.query.PropertyCollector.retrieveContents -- 9008cfb1-d611-14ad-8b3e-202cb5d30d42(52caec42-b1ee-461f-a49d-f6d3f0f243dd)
2015-02-07T02:00:05.658Z [7F719C5D9700 info 'commonvpxLro' opID=2d146636] [VpxLRO] -- FINISH task-internal-54802714 -- -- vmodl.query.PropertyCollector.retrieveContents --
2015-02-07T02:00:05.658Z [7F717AEDD700 error 'Default' opID=SWI-3f804862] [Vdb::IsRecoverableErrorCode] Unable to recover from 53200:7
2015-02-07T02:00:05.658Z [7F717AEDD700 error 'Default' opID=SWI-3f804862] [VdbStatement] SQLError was thrown: "ODBC error: (53200) - ERROR: out of memory
--> Failed on request of size 71.;
--> Error while executing the query" is returned when executing SQL statement "vacuum (analyze) vpx_hist_stat1_227"
Ends up my problem was due to VDP 5.8 making connections to vCenter and not closing the connections out http://kb.vmware.com/kb/2094879
Working with support they provided a patch for my VDP 5.8 environment. I have not had any crashes since this was put in place.
And the log from the postgres service.
TopMemoryContext: 105688 total in 9 blocks; 9072 free (15 chunks); 96616 used
Statistics snapshot: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Per-database function: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Per-database table: 122880 total in 4 blocks; 23920 free (22 chunks); 98960 used
Per-database function: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Per-database table: 57344 total in 3 blocks; 26144 free (11 chunks); 31200 used
Per-database function: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Per-database table: 57344 total in 3 blocks; 26144 free (11 chunks); 31200 used
Databases hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Start worker tmp cxt: 8192 total in 1 blocks; 7672 free (0 chunks); 520 used
smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Autovacuum Launcher: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AV dblist: 0 total in 0 blocks; 0 free (0 chunks); 0 used
tmp AV dblist: 8192 total in 1 blocks; 7672 free (0 chunks); 520 used
db hash: 8192 total in 1 blocks; 2704 free (0 chunks); 5488 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 640 free (0 chunks); 7552 used
CacheMemoryContext: 555696 total in 19 blocks; 129536 free (6 chunks); 426160 used
MdSmgr: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
Postmaster: 24576 total in 2 blocks; 22608 free (73 chunks); 1968 used
ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
5958 tm:2015-02-07 02:00:56.626 UTC db: pid:3816 ERROR: out of memory
5959 tm:2015-02-07 02:00:56.626 UTC db: pid:3816 DETAIL: Failed on request of size 8056.
1 tm:2015-02-07 02:43:00.160 UTC db:VCDB pid:598 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.170 UTC db:VCDB pid:27129 LOG: could not receive data from client: Connection reset by peer
2 tm:2015-02-07 02:43:00.213 UTC db:VCDB pid:27129 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.136 UTC db:VCDB pid:20462 LOG: could not receive data from client: Connection reset by peer
2 tm:2015-02-07 02:43:00.213 UTC db:VCDB pid:20462 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.160 UTC db:VCDB pid:27124 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.160 UTC db:VCDB pid:601 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.170 UTC db:VCDB pid:27120 LOG: could not receive data from client: Connection reset by peer
1 tm:2015-02-07 02:43:00.160 UTC db:VCDB pid:13014 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.161 UTC db:VCDB pid:599 LOG: unexpected EOF on client connection
2 tm:2015-02-07 02:43:00.213 UTC db:VCDB pid:27120 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.180 UTC db:VCDB pid:27119 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.218 UTC db:VCDB pid:27130 LOG: unexpected EOF on client connection
4 tm:2015-02-07 02:43:00.219 UTC db:VCDB pid:26857 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.222 UTC db:VCDB pid:27123 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.160 UTC db:VCDB pid:20464 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.223 UTC db:VCDB pid:27126 LOG: could not receive data from client: Connection reset by peer
2 tm:2015-02-07 02:43:00.223 UTC db:VCDB pid:27126 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.227 UTC db:VCDB pid:27118 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.230 UTC db:VCDB pid:26790 LOG: could not receive data from client: Connection reset by peer
2 tm:2015-02-07 02:43:00.230 UTC db:VCDB pid:26790 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.252 UTC db:VCDB pid:22736 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:43:00.195 UTC db:VCDB pid:27125 LOG: unexpected EOF on client connection
1 tm:2015-02-07 02:45:55.044 UTC db:VCDB pid:13887 LOG: could not send data to client: Connection reset by peer
2 tm:2015-02-07 02:45:55.044 UTC db:VCDB pid:13887 STATEMENT: SELECT * FROM VPX_EVENT ORDER BY EVENT_ID DESC
3 tm:2015-02-07 02:45:55.044 UTC db:VCDB pid:13887 LOG: could not send data to client: Broken pipe
4 tm:2015-02-07 02:45:55.044 UTC db:VCDB pid:13887 STATEMENT: SELECT * FROM VPX_EVENT ORDER BY EVENT_ID DESC
5 tm:2015-02-07 02:46:08.329 UTC db:VCDB pid:13887 LOG: unexpected EOF on client connection
Ends up my problem was due to VDP 5.8 making connections to vCenter and not closing the connections out http://kb.vmware.com/kb/2094879
Working with support they provided a patch for my VDP 5.8 environment. I have not had any crashes since this was put in place.