VMware Cloud Community
jterpstr
Contributor
Contributor

Possible issue found? VirtualCenter performance graphs

I'm starting to get tired of VirtualCenter's performance graphs not working properly. I've had numerous tickets open with VMware and they can never seem to find the solution to my problem. I truncate the data, and it all works for a while, then stops working.

I started to investigate packets that are going to our Oracle database server (we keep VC and the DB on separate servers) and I noticed the FIRST select statement that is run when a performance tab is clicked is a "select * from vpx_hist_stat", a table with 13.7Million rows. I've ran this a few times and it appears to be consistantly the first query after I select the tab.

Is this possible? Is VirtualCenter really doing this? If thats the case its no wonder VC performance keeps getting slower and slower until it eventually stops working.

I'm hoping someone from the VC team is watching this thread, as first level support doesn't seem to be getting me anywhere with this.

0 Kudos
26 Replies
BenConrad
Expert
Expert

I just sniffed the connection between VC and the DB and don't see this statement. Where are you seeing this? In SQL Profiler or a Packet Sniffer? Paste the a few lines before and after, I'd be interested because I think graph performance sucks in general.

Ben

0 Kudos
jterpstr
Contributor
Contributor

I am using Ethereal packet sniffer. Our Oracle database resides on a separate Oracle cluster so I can capture data coming in and out of the system. If I set up ethereal to watch the Oracle Listener port, then click on the performance tab of a VM its always the first statement I see. Yesterday I spent an hour on the phone with first level support showing them this. I can only hope this gets escalated to someone on the VC team to investigate further.

0 Kudos
BenConrad
Expert
Expert

Wow, that's messed up. I on SQL I see 'select sm.id from vpx_sample where sm.sample.time = ......'

FYI, when I do a 'select count (*) from vpx_hist_stat' it usually takes about 60 seconds to return.

For the last few days I've been trimming my SQL database in order to bring it from 50 million to about 20 million rows (60 days of data).

0 Kudos
juchestyle
Commander
Commander

Physical or virtual?

Respectfully,

Matthew

Kaizen!

Kaizen!
0 Kudos
BenConrad
Expert
Expert

app is virtual, DB is physical.

0 Kudos
jterpstr
Contributor
Contributor

Mine is physical/physical. I have a dual CPU application server talking to a quad CPU RAC environment all over GB networking.

0 Kudos
GBromage
Expert
Expert

I have the same issue in a physical/physical environment. The DB is on Oracle for Linux.

My DBA has noted to me that there's an intermittent error with stats rollup query, which seems to be timing out a lot or taking a very long time.

My current working theory is that whilst the hour/day/week stats are being aggregated, the old rows aren't being removed from the table and thus the table is growing faster than my overdraft. Smiley Wink

Please let me know what support line has to say.

-


I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!

I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!
0 Kudos
jterpstr
Contributor
Contributor

GBromage, thats interesting. I also have the same issue. I was provided a 3rd party sql script from VMware a while back to check whether stats rollup was working. It was working for the most part, however the yearly rollup is now marked as failed.

0 Kudos
depping
Leadership
Leadership

Noticed the same at a couple of customer sites, it tends to get slow after a while and this might actually explain why...

Duncan

My virtualisation blog:

0 Kudos
tupelo_operatio
Contributor
Contributor

I am fighting with this issue at this very moment. VMware tech support

has had me update the Virtual Center Server to 2.0.2 Patch 2 but this

didn’t help…still not getting performance data. I was then told to run

the MSSQL purge script referenced in Knowledge Base Article 1000125. I

ran the script and 2 hours later I am still not able to get performance

data. Since it is starting to look like I might have to whack the

current VCDB database…I am thinking it might be just as easy to go

ahead and upgrade to VC2.5. What do you guys think? We are not overly concerned with historic data....we just want to be able to get this data when we need it. I currently have the following:

6 ESX hosts at 3.0.2 61618

2 ESX hosts at 3.0.1 42829

1 ESX host at 3.0.2 62488

Those 2 at 3.0.1 won't be a problem will they?

Grant

0 Kudos
BenConrad
Expert
Expert

I'd hold off on the VC upgrade until 3.5 patch 1 comes out.

Have you run the VCDB_table_cleanup_MSSQL.sql script? it takes a really long time to run but will clean up the history in your database. I ran it, deleted about 25 million rows (50%) have have see a mild improvement. I think I need to move from statistics level 2 -> 1 and then clean again. The way VMWare implemented graphing in VC is terribly inefficient.

0 Kudos
tupelo_operatio
Contributor
Contributor

I did run the sql cleanup script...ran it this morning as a matter of fact. It ran for 2 hours and deleted 27 million rows. I have heard back from vmware support and they want to do a webex sometime to take a look at things themselves. I am assuming you mean VC 2.5 patch 1 instead of 3.5 patch 1 right?

0 Kudos
GBromage
Expert
Expert

Ben, jterpstr, depping and tupelo - are you running your DB on MS SQL Server or Oracle? If Orcale, on Windows or Linux?

I had wondered if it was an Oracle specific (or VMware's implementation of the scripts on Oracle) problem, but based on what you've said I'm not so sure now.

-


I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!

I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!
0 Kudos
BenConrad
Expert
Expert

I'm on SQL 2000

0 Kudos
BenConrad
Expert
Expert

Hmm, your DB server is faster than mine Smiley Sad

I did mean to say 2.5 patch 1.

Ben

0 Kudos
jterpstr
Contributor
Contributor

I am running Oracle 10g RAC on DL385 Linux servers.

Does anyone have MS SQL running on a different server than their VC server that they too can run a packet capture to see if the select * from vpx_hist_stat? This topic seems to be straying a bit back to just "my performance graphs aren't working. I was hoping we could speak more specifically of this one very large select statement that is performed in Oracle, and possibly MSSQL if it can be verified.

0 Kudos
tupelo_operatio
Contributor
Contributor

I am on SQL 2000.

0 Kudos
tupelo_operatio
Contributor
Contributor

For what its worth, I did a webex with VMware Tech Support yesterday afternoon and the Support Engineer seemed to think it was an issue with port 903 being blocked. The webex was done on the Virtual Center Server and I opened up a Putty session to one of the ESX hosts there as well. The engineer referenced http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=749640&sli... as he worked on this issue. He edited the config file just like it says in the KB and then did the xinetd restart. As of this very moment, I am able to get stats on that one particular host and that is all. I tried the steps in the KB on two other hosts and was not able to get the stats...just the timeout error. I updated the support ticket with what I was seeing after the webex ended and the engineer replied back to do a service mgmt-vmware restart on the two other hosts...which I did...but I am still not able to get stats on them.

0 Kudos
tupelo_operatio
Contributor
Contributor

got posted twice for some reason....

Is anyone having problems with the performance of the forums?

0 Kudos