VMware

Virtual Performance

Scott Drummonds works in a variety of performance areas at VMware: VDI, application best practices, competitive analysis, customer performance investigations, and outward bound communications. This blog will detail some of my musings on these subjects.

20 Posts 1 2 Previous Next
2

Microsoft SQL Server runs at roughly 80% of native on VI3 in most benchmarked environments. In production environments, and under loads that model those conditions, SQL Server runs at 90-95% of native on ESX 3.5. I can say this with confidence despite a large amount of the industry's skepticism because I've spent so much time on SQL Server in the past half year. I'd like to share some of my research on the subject and observations with you.

Two weeks ago my colleague Chethan Kumar and I presented on SQL Server in Cannes, France for VMworld Europe 2009. This presentation was the culmination of six months of investigation that was started at VMworld 2008 in Las Vegas. At that event I heard so many concerns about SQL Server performance that I was resolved to identify the problems. I talked with every customer I could find that claimed that SQL ran at anything less than 70% of native. So many of these contacts claimed that they had measured SQL at 25% of native or worse, that I knew that something was going wrong.

First, let me show you a slide that Chethan presented at the show in Cannes:

sql_tuning.png

Chethan spent three months investigating SQL Server to find out how much he could improve virtual performance from the "out of the box" experience. As this figure details, the sum total of performance improvements was 15%. Here's another break-down of these results:

sql_tuning_summary.png

The only option that we found in ESX to improve virtual performance was static transmit coalescing, which is documented on page four of one of our SPECweb papers. Large pages and SQL's priority boost, which are best practices provided by Microsoft for SQL Server configuration, provide the largest gains in performance.

The key messages that we communicated to our audience were that a properly running SQL Server should run at 80% of native or better. In most production cases it can run at a performance indistinguishable from native speed. And if performance is lagging, there don't exist many changes that can be made to ESX that can yield and performance gains at all.

This begs the question: "If ESX can't be tuned to double SQL performance, what is causing these reports of terrible SQL Server throughput?" The great majority of the problems are coming from mis-configured storage. But a variety of other items such as poor hardware selection or use of the wrong virtualization software contribute to the confusion, as well. I've been documenting these issues in Best Practices for SQL Server on this community and will continue to update that document as more problems are discovered.

If you have a SQL Server running un-virtualized in your environment, I'd like you to try virtualizing it again. Follow our best practices document and pay close attention to your storage configuration during deployment. I feel confident that once you've setup your environment properly, you're going to like what you see.

2 Comments Permalink
0

If any of you have heard me speak in the numerous events I've done in the past two years, you may have heard me detail the areas where virtualization performance can exceed native. There are scalability limitations in traditional software that make nearly every enterprise application fall short of utilizing the cores that are available to them today. As the core explosion continues, this under-utilization of processors will worsen. Here is a graph that we've been showing to illustrate that point:

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5369/core_explosion.png

In 2008 I visited VMworld Europe and showed on using multiple virtual machines on a single physical host could circumvent the limitations in today's software. In that experiment we showed that 16,000 Exchange mailboxes could be fit on a single physical server when no one had ever put more than 8,000 on in a single native instance. We called this approach designing by "building blocks" and were confident that as the core count continued to increase, we'd continue to expose more applications whose performance could be improved through virtualization.

On Thursday last week SPEC accepted VMware's submission of a SPECweb2005 result. And last night we posted an article on VROOM! detailing the experiment and providing information on the submission. This submission is an incredible first for us: not only have we shown that we can circumvent limitations in web servers, but we posted a world record performance number in the process. Of course, if any of you have seen Sreekanth Setty's presentation at VMworld on his ongoing work on SPECweb2005, this result wouldn't surprise you:

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5370/specweb_scaling.png

Getting a benchmark standardization body like SPEC to approve these results isn't always easy. Most of the industry remains stuck in a mode of thinking of performance as a single instance's maximum throughput. But given the scale-out capabilities of a large number of enterprise applications I'd argue that benchmarking should account for scale-out capabilities on a single box. VMware's customers follow this practice faithfully in sizing their deployments to match their needs and everyone wants to know the platform's ability to handle this use-case. SPEC's willingness to accept results showing building blocks on a single host is commendable and progressive. As more benchmarks approve submissions like these VMware will continue to be able to show record numbers.

0 Comments Permalink
0

For years now VMware has been providing products that enable a virtual desktop experience. Historically, this would occur in virtual desktops on our hosted products but in some cases virtualization of Citrix XenApp (formerly presentation server) could provide a large number of desktops off a single virtual machine. And more recently VMware View offers a means of hosting a large number of desktops on a single server where each is granted its own operating system instance. As the number of virtual desktops and alternatives for implementing virtual desktops has grown, the need for a benchmark that can compare the performance of these alternatives has arisen.

Desktop benchmarking is not new to the industry, as people have been using PCs for decades. But standards in virtual desktop benchmarking are non-existent. Some might argue that traditional tools, common to PCs for years, should be used. But there are several reasons why this is not true:

  1. Pre-virtual desktop benchmarking is built to completely saturate all memory and CPU resources provided. Fully saturating CPU on a single multi-way VM, as an example, results in far fewer VMs per host than is common in VDI deployments. Fewer VMs means less work for the hypervisor's scheduler.
  2. Existing desktop benchmarks are often throughput-based, as opposed to latency-based. Because existing tools want to differentiate between powerful processors and large amounts of memory, they're designed to pack more and longer instructions in each run than is common in virtual desktop deployments. Most desktop deployments won't run massive video renders but the response times of individual button clicks and window appearance is critical.
  3. No existing benchmarks are aware of the peculiarities of VM-based timing. VDI benchmarks need to be aware of this by either using host timing or invoking and measuring operations from remote, non-virtual locations.
At VMworld 2008 VMware presented a VDI workload that had been constructed from a collaborative effort between all VDI teams within VMware with review and qualification by several of our partners. The first measurements on this workload came from Dell and EqualLogic and we quickly made details on its characteristics available via white paper. Key features of this workload include:

  • A diverse set of applications (Word, PowerPoint, Excel, Acrobat, and Internet Explorer) common to business desktop deployments.
  • Load generation modeled after the most common VDI deployments.
  • Small (less than 500 ms) operation generation and measurement.
  • Host-based measurement and an architecture to support remote command invocation in the next release.

As an attempt at the world's first VDI benchmark, we're very pleased with our efforts. We found that it met the unique requirements in measuring virtual desktops of all kinds. And since it was generated with large group of internal collaborators and multiple partners, it's an excellent beginning at what the industry needs to standardize this process.

But today we realize that its just a beginning. I want to encourage everyone to bring your comments to VMware via this blog or the performance forums on what you think the characteristics of a industry standard virtual desktop benchmark should be. We'll never make one benchmark that meets everyone's needs and I suspect that there are even some common needs that will require significant development resources. But I expect that with your guidance and assistance in refining this workload we'll accelerate the process of getting this benchmark in a shape that the industry can embrace

0 Comments Permalink
2

DPM Power/Performance Video

Posted by drummonds VMware Nov 6, 2008

Back in September the performance team here at VMware embarked on a project to measure power savings as a result of using VI3's distributed power management (DPM). This feature, experimentally supported in VI3 will full support planned for the next release, leverages DRS to consolidate idle and lightly-loaded VMs onto as few servers as possible. Once the workload has been consolidated to the bare minimum hardware required, spare servers are powered down. The end result is the flexible performance due to automated load balancing and a halving of total power usage.

The experiment that we performed was based on a workload derived from VMmark. In fact, it was precisely the VMmark workload. But the execution of the test against a cluster of systems makes the results invalid for comparison against other systems. VMmark run rules require the test run against VMs on a single server.


We started the test with 13 tiles worth of VMs (108 VMs in all) on the DRS cluster. With all of these VMs idle, DPM consolidated them to a single host and turned off three servers. As the load was applied to the VMs at 9:00 AM and driven through an eight-hour workday, DRS and DPM powered on servers and balanced load, as needed. When the day ended at 5:00 PM, the load was again consolidated and servers were powered down. The video we shot includes power meters of the systems under test and screenshots of activity induced by DRS and DPM.


Check out the video on YouTube and let me know what you think. I'm considering recording some of the other amazing things that we're doing with our products and would love your feedback on what you'd like to see.

2 Comments 0 References Permalink
0

EMC World 2008

Posted by drummonds VMware May 19, 2008

I'm going to be at EMC World 2008 from May 19 through May 23. I'll be presenting on VMware's VI3 architecture with respect to performance, tips and techniques for performance monitoring and analysis, and best practices for best performance. My session, titled "VMware ESX Server Performance Analysis", is offered on Monday at 4:30 and Tuesday at 11:30. I'll also be hosting birds-of-a-feather sessions on the same subject at 2:30 on Tuesday. Please drop by if you're at the conference!

0 Comments Permalink
1 2 Previous Next

Virtual Performance

Scott Drummonds works in a variety of performance areas at VMware: VDI, application best practices, competitive analysis, customer performance investigations, and outward bound communications. This blog will detail some of my musings on these subjects.

Communities