Hi all,
We are running vCenter Server 5.0 U2 on MS SQL Server 2008. Our vCenter has nearly 500 hosts and over 5000 guests.
For several weeks now we've been experiencing massive gaps in the performance rollups for all objects in vCenter. We get between 1-2 hours of data followed by 1-2 hour gaps. We have an open SR with VMware but are pretty much at a stalemate as to what the next step is other than to truncate the vpx_hist_stat1 table. I'm concerned that even this step won't alleviate the problem...? Would very much appreciate a sanity check and any suggestions you guys may have!
The vpxd.log file consistently logs errors like this:
"SQL execution took too long: UPDATE VPX_SDRS_STATS_DATASTORE WITH (ROWLOCK) SET QUANTILES = ? WHERE ROW_ID = ?
2013-09-20T15:26:26.254Z [06472 warning 'Default' opID=SWI-4a765b63] [VdbStatement] Execution elapsed time: 3027 ms"
Have also seen
2013-09-20T15:15:10.805Z [06816 error 'Default' opID=SWI-a6af8cb3] Had to drop performance data coming from host host-9640 because it has been waiting for 1806 secs to be processed. Perhaps the DB is too slow
Our dba assures us there are no errors in the SQL Event Log, the transaction logs are fine and no recent evidence of blocking. We have also verified that network performance is optimal. vCenter functionality is otherwise fine.
Troublshooting has led to:
- Performance statistics levels are all set to 1.
- Truncating the vpx_temptable(s) per http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103089... but no change.
- Enabled the database retention policy and have been slowly reducing the number of days to retain data for, but there has been zero change and the vpx_hst_stat1 table continues to contain approx 60 million rows. Not sure this is a problem given the size of our environment.
The only other option appears to be to truncate the vpx_hist_stat1 table per
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100745...
My questions are:
1. What is the likelihood that truncating the vpx_hist_stat1 table will resolve the performance gap problem? This is listed as a last resort option and for good reason. Would hate to lose all that data if unnecessary.
2. Anything else we can try?
Thanks in advance for your assistance!