I'm starting to get tired of VirtualCenter's performance graphs not working properly. I've had numerous tickets open with VMware and they can never seem to find the solution to my problem. I truncate the data, and it all works for a while, then stops working.
I started to investigate packets that are going to our Oracle database server (we keep VC and the DB on separate servers) and I noticed the FIRST select statement that is run when a performance tab is clicked is a "select * from vpx_hist_stat", a table with 13.7Million rows. I've ran this a few times and it appears to be consistantly the first query after I select the tab.
Is this possible? Is VirtualCenter really doing this? If thats the case its no wonder VC performance keeps getting slower and slower until it eventually stops working.
I'm hoping someone from the VC team is watching this thread, as first level support doesn't seem to be getting me anywhere with this.
I just sniffed the connection between VC and the DB and don't see this statement. Where are you seeing this? In SQL Profiler or a Packet Sniffer? Paste the a few lines before and after, I'd be interested because I think graph performance sucks in general.
Ben
I am using Ethereal packet sniffer. Our Oracle database resides on a separate Oracle cluster so I can capture data coming in and out of the system. If I set up ethereal to watch the Oracle Listener port, then click on the performance tab of a VM its always the first statement I see. Yesterday I spent an hour on the phone with first level support showing them this. I can only hope this gets escalated to someone on the VC team to investigate further.
Wow, that's messed up. I on SQL I see 'select sm.id from vpx_sample where sm.sample.time = ......'
FYI, when I do a 'select count (*) from vpx_hist_stat' it usually takes about 60 seconds to return.
For the last few days I've been trimming my SQL database in order to bring it from 50 million to about 20 million rows (60 days of data).
Physical or virtual?
Respectfully,
Matthew
Kaizen!
app is virtual, DB is physical.
Mine is physical/physical. I have a dual CPU application server talking to a quad CPU RAC environment all over GB networking.
I have the same issue in a physical/physical environment. The DB is on Oracle for Linux.
My DBA has noted to me that there's an intermittent error with stats rollup query, which seems to be timing out a lot or taking a very long time.
My current working theory is that whilst the hour/day/week stats are being aggregated, the old rows aren't being removed from the table and thus the table is growing faster than my overdraft.
Please let me know what support line has to say.
-
I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!
GBromage, thats interesting. I also have the same issue. I was provided a 3rd party sql script from VMware a while back to check whether stats rollup was working. It was working for the most part, however the yearly rollup is now marked as failed.
I am fighting with this issue at this very moment. VMware tech support
has had me update the Virtual Center Server to 2.0.2 Patch 2 but this
didn’t help…still not getting performance data. I was then told to run
the MSSQL purge script referenced in Knowledge Base Article 1000125. I
ran the script and 2 hours later I am still not able to get performance
data. Since it is starting to look like I might have to whack the
current VCDB database…I am thinking it might be just as easy to go
ahead and upgrade to VC2.5. What do you guys think? We are not overly concerned with historic data....we just want to be able to get this data when we need it. I currently have the following:
6 ESX hosts at 3.0.2 61618
2 ESX hosts at 3.0.1 42829
1 ESX host at 3.0.2 62488
Those 2 at 3.0.1 won't be a problem will they?
Grant
I'd hold off on the VC upgrade until 3.5 patch 1 comes out.
Have you run the VCDB_table_cleanup_MSSQL.sql script? it takes a really long time to run but will clean up the history in your database. I ran it, deleted about 25 million rows (50%) have have see a mild improvement. I think I need to move from statistics level 2 -> 1 and then clean again. The way VMWare implemented graphing in VC is terribly inefficient.
I did run the sql cleanup script...ran it this morning as a matter of fact. It ran for 2 hours and deleted 27 million rows. I have heard back from vmware support and they want to do a webex sometime to take a look at things themselves. I am assuming you mean VC 2.5 patch 1 instead of 3.5 patch 1 right?
Ben, jterpstr, depping and tupelo - are you running your DB on MS SQL Server or Oracle? If Orcale, on Windows or Linux?
I had wondered if it was an Oracle specific (or VMware's implementation of the scripts on Oracle) problem, but based on what you've said I'm not so sure now.
-
I hope this information helps you. If it does, please consider awarding points with the 'Helpful' or 'Correct' buttons. If it doesn't help you, please ask for clarification!
I'm on SQL 2000
Hmm, your DB server is faster than mine
I did mean to say 2.5 patch 1.
Ben
I am running Oracle 10g RAC on DL385 Linux servers.
Does anyone have MS SQL running on a different server than their VC server that they too can run a packet capture to see if the select * from vpx_hist_stat? This topic seems to be straying a bit back to just "my performance graphs aren't working. I was hoping we could speak more specifically of this one very large select statement that is performed in Oracle, and possibly MSSQL if it can be verified.
I am on SQL 2000.
For what its worth, I did a webex with VMware Tech Support yesterday afternoon and the Support Engineer seemed to think it was an issue with port 903 being blocked. The webex was done on the Virtual Center Server and I opened up a Putty session to one of the ESX hosts there as well. The engineer referenced http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=749640&sli... as he worked on this issue. He edited the config file just like it says in the KB and then did the xinetd restart. As of this very moment, I am able to get stats on that one particular host and that is all. I tried the steps in the KB on two other hosts and was not able to get the stats...just the timeout error. I updated the support ticket with what I was seeing after the webex ended and the engineer replied back to do a service mgmt-vmware restart on the two other hosts...which I did...but I am still not able to get stats on them.
got posted twice for some reason....
Is anyone having problems with the performance of the forums?