- Contributors:
-
drummonds
-
RDellimmagine
I have moved my blog home to a new location. Come visit and read at vPivot.com.
A couple of days ago we finally got out one of my favorite papers from our ongoing vSphere launch activities. This paper on ESX memory management, written by Fei Guo in performance engineering, has three graphs that are absolute gems. These graphs show balloon driver memory savings next to throughput numbers for three common benchmarks. The conclusion is inescapable: the balloon driver reclaims memory from over-provisioned VMs with virtually no impact to performance. This is true on every workload save one: Java.
I spent a great deal of time answering customers' questions about the scheduler. Never have so many questions been asked about such an abstruse component for which so little user influence is possible. But CPU scheduling is central to system performance, so VMware strives to provide as much information on the subject as possible. In this blog entry, I want to point out a few nuggets of information on the CPU scheduler. These four bullets answer 95% of the questions I get asked.
Just over a week ago I had the privilege of riding along with VMware's Professional Services Organization as they piloted a possible performance offering. We are considering two possible services: one for performance troubleshooting and another for infrastructure optimization. During this trip we piloted the troubleshooting service, focusing on the customer's disappointing experience with SQL Server's performance on vSphere.
If you have read my blog entries (SQL Server Performance Problems Not Due to VMware) or heard me speak, you know that SQL performance is a major focus of my work. SQL Server is the most common source of performance discontent among our customers, yet 100% of the problems I have diagnosed were not due to vSphere. When this customer described the problem, I knew this SQL Server issue was stereotypical of my many engagements:
"We virtualized our environment nearly a year ago and and quickly determined that virtualization was not right for our SQL Servers. Performance dropped by 75% and we know this is VMware's fault because we virtualized on much newer hardware on the exact same SAN. We have since moved the SQL instance back to native."Most professionals in the industry stop here, incorrectly bin this problem as a deficiency of virtualization, and move on with their deployments. But I know that vSphere's abilities with SQL Server are phenomenal, so I expect to make every user happy with their virtual SQL deployment. I start by challenging the assumptions and trust nothing that I have not seen for myself. Here are my first steps on the hunt for the source of the problem:
| RAID Group | Configuration | Performance Estimate |
| A | RAID5 using 4 15K disks | 4 x 200 = 800 IOPS |
| B | RAID5 using 7 10K disks | 7 x 150 = 1050 IOPS |
| SQL Instance | Peak IOPS | Average IOPS |
| X (physical) | 1800 | 850 |
| Y (virtual) | 1000 | 400 |
Last week Chris Wolf moderated a debate on virtual platform performance between myself and Simon Crosby, CTO of Citrix. A recording of the debate was put online shortly after its conclusion.
Simon and I disagreed on a few issues and demonstrated different strategies in the discussion. My goal in representing the fine efforts of our performance team was to show to the audience VMware's commitment to product performance. This commitment is demonstrated through a never ending series of benchmark publications and continual product improvement. In the years since I joined VMware we have quantified ESX's ability to serve web pages (SPECweb), enable massive numbers of database transactions (TPC-C, with disclaimers), and establish industry leadership in consolidated workloads (VMmark). As we released these and dozens of other numbers, Citrix has remained silent on its own product's performance.
I was pleased that the event's format gave me the opportunity to discuss our accomplishments. My only regret was that I lacked the time to dispense with the most important of several factual inaccuracies from Simon. At one point in the discussion Simon claimed that VMmark is not run by anyone except VMware. In fact, it is closer to the truth to say that VMmark is run by everyone except VMware. A quick view of the VMmark results page will show results from every major server vendor, with no submissions from VMware.
Thanks to the Burton Group and Chris Wolf for letting me participate. It was a pleasure.
I was recently copied on an internal thread discussing a performance tweak for VMware vSphere. The thread discussed gains that can be derived from an adjustment to the CPU scheduler. In ESX 3.5, ESX's cell construct limited vCPU mobility between different sockets. ESX 4.0 has no such limitations and its aggressive migrations are non-optimal in some cases.
This thread details the application of this change in ESX 4 and provides some insight into its impact. This scheduler modification is going to be baked in to the first update to ESX 4.
My colleague in product management, Praveen Kannan, has been working to extend Perfmon to show some ESX performance counters. This capability is automatically installed with VMware Tools on vSphere 4. But Praveen and I have made a stand-alone version available to those of you that are still on VI3. Download it here to give it a try.
To install, place the file in an appropriately-named directory on any Windows VM on VI3. Double-click the executable, which will self-extract the files into the same directory. Run "install.bat" and you're done.
Once you bring up Perfmon you'll see two new performance objects on your computer: "VM Memory" and "VM Processor". These objects contain counters exposed by ESX that accurately reflect the VM's memory and CPU usage. Here's Perfmon on my test VM after I've installed the tool.
This makes collection of host stats a breeze. Windows Management Instrumentation (WMI) programs can now easily get access to reliable host statistics. And anyone with access to Perfmon can get see their VM's resource usage. Unlike guest-based statistics, the host-statistics shown through these counters accurately reflect resource usage in the presence of virtualization overheads and time slicing of VMs.
Disclaimer:
This is a pre-release "sneak peak" version. Eventually this tool will be available for download on vmware.com and supported by VMware. But today there is no support for this tool and you're using it "as-is". Use at your own risk and do not contact VMware support for help with this release.That's VMware's official position on this tool. But feel free to comment here with any ideas about this great new feature.
There's been no shortage of comments on the Hyper-V video I posted. I made a comment on this action in a VMTN blog entry. Read up and comment here or there.
A few weeks ago our communities' administrators setup an XML aggregation of all blogs in VMware's performance community. In addition to the regular postings coming from VROOM! and me, there are several other members of our performance team that irregularly contribute new content. If you follow the aggregator and its RSS feed then you'll be notified of new performance content as it goes live.
The aggregator can be found at http://www.vmware.com/vmtn/planet/vmware/performance.xml.
Enjoy!
Newer processors are much more important to virtualization than physical, un-virtualized environments. The generational improvements haven't just increased the raw compute power, they've also reduced the overheads associated with virtualization. This blog entry will describe three key changes that have particularly impacted virtual performance.


Its been about 10 days since I posted the YouTube video showing Hyper-V's stability problems in consolidated environments. I immediately received a lot of questions about the configuration that I answered to the best of my ability in my "Video on Hyper-V Crashes" blog entry. Many respondents were not surprised by stability problems with a first-generation product and some people requested more detail on this issue for further discussion. But there were too many comments to address in all.
One of the more interesting emails I received pointed out that it unreasonable to blame Hyper-V for the collapse of these very large and very busy websites. Hyper-V's stability issues would bring down individual VMs or small groups when the parent partition blue screened. I think that this is a reasonable observation, so its worth including here. I can't say that Hyper-V was responsible for the MSDN and TechNet crashes. That would be for Microsoft to say, when and if they choose to expose the issue behind the outage.
Lastly, all comments come from people that fall into one of two categories: one camp thinks the video captures are bogus and the other believes they're based on a real, reasonable, repeatable workload. I'm not going to try and move you from one camp to the other.
It is clear that a small, vocal, and surprisingly profane number of you think that I made this whole thing up. The premise of this latter group appears to be that Microsoft wouldn't make a product that a customer could crash under normal conditions. If this is your reasoning then no video, discussion or demonstration is going to change your mind. I'll let everyone else make their decisions based on Microsoft's track record and his or her experience with Microsoft products.
Since I posted the YouTube video showing Hyper-V blue screens last Friday I've received a lot of comments, questions, compliments and complaints. The video and descriptive text have raised more questions than answers, so here are a few details to help fill out the story.
At VMworld Europe 2009 my engineering colleague Chethan Kumar and I presented the results of a six-month investigation into the performance of SQL Server on ESX. Tomorrow (May 12 at 09:00 PDT) we're going to offer an updated version of this session to the general public. If you have any interest in virtualized SQL Server deployments, please register and attend the presentation to discover what we learned in our investigation.
I provided some notes on that presentation in a blog entry (SQL Server Performance Problems Not Due to VMware) right after the show. But the large numbers of attendees and exceptionally high ratings encouraged me to setup this encore session. And since Chethan's research on SQL Server performance tuning has continued, we have some updates to the experimental results.
In tomorrow's webinar we will tell the story of our exploration into persistent rumors of SQL Server performance problems. The search began after VMworld 2008 when I decided to engage every customer with a complaint on SQL Server performance. At the same time Chethan investigated every possible application, operating system, and hypervisor parameter that could impact SQL performance. I talked to dozens of customers and Chethan spent hundreds of hours on this work.
This presentation will detail the results of our investigation and leave its attendees with a clear understanding SQL performance on VMware. Our conclusions are surprisingly simple and certain to help you get the most out of your virtual infrastructure.
There's a lot of confusion out there on VMware's support for the CPU vendors' virtualization assist technology. VMware has always led the industry with its support for hardware assist. We were the first vendor to support AMD-v and Intel VT-x in 2006, the first to support AMD RVI in 2008, and will be the first to support Intel EPT when vSphere 4 becomes publicly available. These technologies--which we call hardware assist--provide value to the part of ESX we call the monitor.
As we prepare for vSphere's general availability we're generating a lot of documentation to help people get the most out of the new version of ESX. One of my colleagues started a document that details the role of the monitor and how it flexibly uses different hardware assist technologies. I've summarized the default behavior of our monitor in several situations in ESX Monitor Modes. Of course vSphere's users will be able to override these defaults if they want to experiment with their workloads.
I wanted to include a textual summary of the role of the monitor in virtualization but found myself getting bogged down with the writing. So, I thought I'd try something new. Let me know what you think of this short video clip explaining the role of the monitor and how it might leverage hardware assist.
{youtube}http://www.youtube.com/watch?v=PYqsxIE5P-U{youtube}
I recently attended a practice talk for next week's Partner Exchange hosted by Kit Colbert, one of our senior engineers, who is leading a whole bunch of cool efforts around performance. I wanted to "leak" one slide that his showed us that we'll be touching up for publication. Some of you that are curious about memory counters and want a different take from Memory Performance Analysis and Monitoring may find this interesting.
Some of this stuff won't make sense outside of Kit's presentation, but let me point out a few things that may help consume the information in this incredible chart: