VMware Cloud Community
jterpstr
Contributor
Contributor

Possible issue found? VirtualCenter performance graphs

I'm starting to get tired of VirtualCenter's performance graphs not working properly. I've had numerous tickets open with VMware and they can never seem to find the solution to my problem. I truncate the data, and it all works for a while, then stops working.

I started to investigate packets that are going to our Oracle database server (we keep VC and the DB on separate servers) and I noticed the FIRST select statement that is run when a performance tab is clicked is a "select * from vpx_hist_stat", a table with 13.7Million rows. I've ran this a few times and it appears to be consistantly the first query after I select the tab.

Is this possible? Is VirtualCenter really doing this? If thats the case its no wonder VC performance keeps getting slower and slower until it eventually stops working.

I'm hoping someone from the VC team is watching this thread, as first level support doesn't seem to be getting me anywhere with this.

0 Kudos
26 Replies
tupelo_operatio
Contributor
Contributor

I found something at http://www.yellow-bricks.com/2008/02/11/issue-with-performance-stats-in-virtualcenter/ in the comments that kinda sorta works. The fifth comment offers some settings to try on the VI Client....

They told me to do the following 2 steps:

- (VI Client): Administration -> VirtualCenter Management Server

Configuration -> Statistics -> System Resources -> Statistics

collection thread limit: value changed to 16

- (VI Client): Administration -> VirtualCenter Management Server

Configuration -> Timeout Settings -> Client Connection Timeout

-> Normal Operations: value changed from 30 to 300

The problem is that I cannot get it to perform consistently on all of my ESX hosts. I currently have 9 ESX hosts that are grouped into 3 clusters within my datacenter. The first cluster has ESX hosts running on 3 IBM HS20 blade servers and is called HS20. The second cluster has ESX hosts running on 3 IBM HS21 blade servers and is called HS21. The third cluster has ESX hosts running on 3 IBM HS20 blade servers and is called VDI.

So far I have been able to get stats on all 3 hosts in the HS20 cluster as well the random guests that I selected within that cluster...granted it sometimes takes the entire 5 minutes to return the data. The same can be said for the HS21 cluster. The VDI cluster is where I cannot get consistent results. I can go through one time and be able to get stats on 2 of the 3 hosts and any guest I want to see. Another time I cannot get stats on any of the hosts or guests. This having to do a service vmware-vpxa restart and a service mgmt-vmware restart every time I need to get stats is unacceptable in my opinion.

We are somewhat hamstrung here where I work since we do not have a DBA on staff, either MS or Oracle. I would give anything to be able to find a good guide for maintaining the Virtual Center database running on SQL 2000.

Do any of you guys think just starting over with a clean database might fix the problem? Or maybe reinstalling the virtual center server all together and letting it overwrite the database might fix things. Its getting rather desperate since the last email I got from the VMware Support Engineer said he didn't know what else to tell me. I am needing to get performance stats because some of our VDI users are seeing performance issues and I need some data to go on.

0 Kudos
GBromage
Expert
Expert

Do any of you guys think just starting over with a clean database might fix the problem?

Yes and no. The underlying problem seems to be that the stats rollup isn't working. Therefore, the statistics table is growing very large, and so queries are taking too long to run. Therefore, you get timeouts.

Clearing the database will solve the problem in the short term. The table will be emptied and speed will improve. Until the table fills up again and you'll be back to where you are now.

I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!

I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!
0 Kudos
RParker
Immortal
Immortal

> I'd hold off on the VC upgrade until 3.5 patch 1 comes out.

Why? We use VC 2.5 w/ SQL 2005 SP2, no problems here. I'd say the problem is with this setup, since we see no performance tab problems, latency, or unneeded traffic. So the problem isn't with VC since this problem is isolated, it's not happening to everyone.

0 Kudos
BenConrad
Expert
Expert

Let me put it this way. I'm running 3.0.1, he's running 3.0.2 and they both work 'well'. I think the graphing issues for me are performance related and for him they may have something to do with the 'select * from vpx_hist_stat' query.

VMWare should be releasing an update for Virtualcenter 2.5 soon which should only make the 2.5 release better. I'm waiting for the update and will reap the benefits of referencing the Community troubleshooting and fixed bugs. Smiley Happy

0 Kudos
Shafti
Contributor
Contributor

I've the same issues with VC 2.5.0 and SQL Server 2000. Please check the existence of categories or jobs for stats rollup in msdb database.

I think the missing or corruption of the jobs are depending on the update history and the miss of dbowner rights on the msdb database during the first VC installation.

Querys to check existence of job and categorie in msdb database

SELECT * FROM msdb.dbo.syscategories WHERE name = N'Stats Rollup';

SELECT * FROM msdb.dbo.sysjobs WHERE name = N'Past Week stats rollup';

My problem or question is an solution to fix this problem without a restore of the database back to a point prior the upgrade to VC 2.5 (db version 3) and rerun the db upgrade (db version 4).

0 Kudos
vmrulz
Hot Shot
Hot Shot

Man we decided to run this script as part of our upgrade to 2.5.. without being able to test it ona like size db. Our 2.0.2 db is 47Gig.. of which VPX_HIST_STAT is 215 million rows! We let it run for 3 hours and it got down to 180 million rows and was still going with no end in site... we killed it and will try pruning over a weeked. Good grief.. now onto the actual upgrade.






Mother's don't let your children do production support for a living!

0 Kudos
icor
Contributor
Contributor

My config is:

Oracle 10G 10.2.0.2.0 on linux 64bit (Itanium) (now non supported) and VC 2.0.2

We have seen this:

VPX_HIST_STAT is about 2 GB with 80 millions of rows

The query generated from the client via ODBC force the regeneration of the index VPXII_HIST_STAT (all the times).

VPXII_HIST_STAT is about 1.5 GB. We have tried to rename the index and the query time as been reduced from 90 sec to 1 or 2 sec.

Query example:

SELECT _/+ INDEX(st VPXII_HIST_STAT) /_ DISTINCT sm.sample_time, sm.sample_interval FROM vpx_sample sm, vpx_hist_stat st WHERE st.entity_id = :1 AND st.sample_id = sm.id AND sm.sample_time BETWEEN :2 AND :3 AND sm.sample_interval <= :4 ORDER BY sm.sample_time

0 Kudos