VMware

Virtual Performance

Scott Drummonds works in a variety of performance areas at VMware: VDI, application best practices, competitive analysis, customer performance investigations, and outward bound communications. This blog will detail some of my musings on these subjects.

3 Posts tagged with the sql tag
0

Just over a week ago I had the privilege of riding along with VMware's Professional Services Organization as they piloted a possible performance offering. We are considering two possible services: one for performance troubleshooting and another for infrastructure optimization. During this trip we piloted the troubleshooting service, focusing on the customer's disappointing experience with SQL Server's performance on vSphere.

If you have read my blog entries (SQL Server Performance Problems Not Due to VMware) or heard me speak, you know that SQL performance is a major focus of my work. SQL Server is the most common source of performance discontent among our customers, yet 100% of the problems I have diagnosed were not due to vSphere. When this customer described the problem, I knew this SQL Server issue was stereotypical of my many engagements:

"We virtualized our environment nearly a year ago and and quickly determined that virtualization was not right for our SQL Servers. Performance dropped by 75% and we know this is VMware's fault because we virtualized on much newer hardware on the exact same SAN. We have since moved the SQL instance back to native."
Most professionals in the industry stop here, incorrectly bin this problem as a deficiency of virtualization, and move on with their deployments. But I know that vSphere's abilities with SQL Server are phenomenal, so I expect to make every user happy with their virtual SQL deployment. I start by challenging the assumptions and trust nothing that I have not seen for myself. Here are my first steps on the hunt for the source of the problem:

  1. Instrument the SQL instance that has been moved back to native to profile its resource utilization. Do this by running Perfmon to collect stats on the database's memory, CPU, and disk usage.
  2. Audit the infrastructure and document the SAN configuration. Primarily I will need RAID group and LUN configuration and an itemized list of VMDKs on each VMFS volume.
  3. Use esxtop and vscsiStats to measure resource utilization of important VMs under peak production load.

There are about a dozen other things that I could do here, but my experience in these issues is that I can find 90% of all performance problems with just these three steps. Let me start by showing you the two RAID groups that were most important to the environment. I have greatly simplified the process of estimating these groups' performance, but the rough estimate will serve for this example:

RAID Group Configuration Performance Estimate
A RAID5 using 4 15K disks 4 x 200 = 800 IOPS
B RAID5 using 7 10K disks 7 x 150 = 1050 IOPS

We found two SQL instances in their environment that were generating significant IO: one that had been moved back to native and one that remained in a virtual machine. By using Perfmon for the native instance and vscsiStats the virtual one, we documented the following demands during a one-hour window:

SQL Instance Peak IOPS Average IOPS
X (physical) 1800 850
Y (virtual) 1000 400


In the customer's first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A. But in the native configuration SQL Server X was placed on RAID group B. This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS. In the virtual configuration the two databases shared a single 800 IOPS RAID volume.

It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400. And this was not news to the VI admin on-site, either. What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated. In fact, through vscsiStats analysis (Using vscsiStats for Storage Performance Analysis), my contact and I were able to identify an "unused" VMDK with moderate sequential IO that we immediately recognized as log traffic. Inspection of the application's configuration confirmed this.

Despite the explosion of VMware into the data center we remain the new kid on the block. As soon as performance suffers the first reaction is to blame the new kid. But next time you see a performance problem in your production environment, I urge you to look at the issue as a consolidation challenge, and not a virtualization problem. Follow the best practices you have been using for years and you can correct this problem without needing to call me and my colleagues to town.

Of course, if you want to fly us out for to help you correct a specific problem or optimize your design, I promise we will make it worth your while.

0 Comments Permalink
0

At VMworld Europe 2009 my engineering colleague Chethan Kumar and I presented the results of a six-month investigation into the performance of SQL Server on ESX. Tomorrow (May 12 at 09:00 PDT) we're going to offer an updated version of this session to the general public. If you have any interest in virtualized SQL Server deployments, please register and attend the presentation to discover what we learned in our investigation.

I provided some notes on that presentation in a blog entry (SQL Server Performance Problems Not Due to VMware) right after the show. But the large numbers of attendees and exceptionally high ratings encouraged me to setup this encore session. And since Chethan's research on SQL Server performance tuning has continued, we have some updates to the experimental results.

In tomorrow's webinar we will tell the story of our exploration into persistent rumors of SQL Server performance problems. The search began after VMworld 2008 when I decided to engage every customer with a complaint on SQL Server performance. At the same time Chethan investigated every possible application, operating system, and hypervisor parameter that could impact SQL performance. I talked to dozens of customers and Chethan spent hundreds of hours on this work.

This presentation will detail the results of our investigation and leave its attendees with a clear understanding SQL performance on VMware. Our conclusions are surprisingly simple and certain to help you get the most out of your virtual infrastructure.

0 Comments Permalink
2

Microsoft SQL Server runs at roughly 80% of native on VI3 in most benchmarked environments. In production environments, and under loads that model those conditions, SQL Server runs at 90-95% of native on ESX 3.5. I can say this with confidence despite a large amount of the industry's skepticism because I've spent so much time on SQL Server in the past half year. I'd like to share some of my research on the subject and observations with you.

Two weeks ago my colleague Chethan Kumar and I presented on SQL Server in Cannes, France for VMworld Europe 2009. This presentation was the culmination of six months of investigation that was started at VMworld 2008 in Las Vegas. At that event I heard so many concerns about SQL Server performance that I was resolved to identify the problems. I talked with every customer I could find that claimed that SQL ran at anything less than 70% of native. So many of these contacts claimed that they had measured SQL at 25% of native or worse, that I knew that something was going wrong.

First, let me show you a slide that Chethan presented at the show in Cannes:

sql_tuning.png

Chethan spent three months investigating SQL Server to find out how much he could improve virtual performance from the "out of the box" experience. As this figure details, the sum total of performance improvements was 15%. Here's another break-down of these results:

sql_tuning_summary.png

The only option that we found in ESX to improve virtual performance was static transmit coalescing, which is documented on page four of one of our SPECweb papers. Large pages and SQL's priority boost, which are best practices provided by Microsoft for SQL Server configuration, provide the largest gains in performance.

The key messages that we communicated to our audience were that a properly running SQL Server should run at 80% of native or better. In most production cases it can run at a performance indistinguishable from native speed. And if performance is lagging, there don't exist many changes that can be made to ESX that can yield and performance gains at all.

This begs the question: "If ESX can't be tuned to double SQL performance, what is causing these reports of terrible SQL Server throughput?" The great majority of the problems are coming from mis-configured storage. But a variety of other items such as poor hardware selection or use of the wrong virtualization software contribute to the confusion, as well. I've been documenting these issues in Best Practices for SQL Server on this community and will continue to update that document as more problems are discovered.

If you have a SQL Server running un-virtualized in your environment, I'd like you to try virtualizing it again. Follow our best practices document and pay close attention to your storage configuration during deployment. I feel confident that once you've setup your environment properly, you're going to like what you see.

2 Comments Permalink

Virtual Performance

Scott Drummonds works in a variety of performance areas at VMware: VDI, application best practices, competitive analysis, customer performance investigations, and outward bound communications. This blog will detail some of my musings on these subjects.

Communities