At VMworld 2008 in Las Vegas several of us in our virtual performance team met with a variety of customers to talk about Microsoft SQL Server. We already had a large base of customers running very many SQL Server DBs on our products and we wanted to collect information on the challenges posed in the process of virtualizing this critical workload. We were pleased to see that ESX Server handled SQL VMs with excellent performance. But, for many customers, the first efforts at virtualizing SQL didn't yield high-performing SQL VM. After careful investigation and many, many discussions we've started to put together the puzzle as to where SQL Server performance problems come from. This page will document these common problems, borrowing slides from our presentations on the subject.
Virtualizing SQL: The Checklist
We've talked with dozens of customers in the past months to document the issues that resulted in poor SQL performance. Happily, none of the issues were due to underlying technologies. Here is a list of issues and an explanation of the impacts. These items are roughly listed in the order of decreasing likelihood of occurrence.
Item 1: Configure Storage Correctly
Storage configuration problems are the number one cause of SQL performance issues. Usually these problems arise because the DBA requests a virtual disk of the VI admin, the VI admin places the VMDK on a LUN that may or may not meet the DBA's performance needs. For instance:
- VMs' VMDK files placed on VMFS volumes without enough spindles.
- Many VMDK files placed on a single VMFS volume which could use more spindles.
- Database and log files placed on the same LUN which, you guessed it, could use more spindles.
This may be obvious to some, but this problem occurs again and again. The VI administrator should be aware of a few technical items that can help understand and avoid this problem:
- Based on the IO demands of the DB files, a certain number of spindles should be guaranteed to this file. This means that its VMDK must be placed on a VMFS volume to accout for the SQL Server's demands and all of the other demands on that volume.
- Mixing sequential activity (such as log file update) and random activity (such as database access) results in random behavior. This means that the LUN configuration in the pre-virtual physical environment may not be sufficient for the consolidated environment. This is discussed some in Storage Performance: VMFS and Protocols.
- When storage isn't meeting the SQL Server's demands, the device latency or kernel latency (queueing time) will increase. Read up on these counters in Storage Performance Analysis and Monitoring.
Item 2: Use Recent Hardware
Often companies that are dipping their metaphorical toes into
virtualization want to run proof-of-concept (POC) experiments to verify
that the virtual platform can meet their performance expectations. But
its surprising how many times these experiments are run on older,
poorly-performing hardware. Presumably the shiny, new systems were in
use for production applications so only the mothballed, cobweb-covered
servers from a previous generation were available for the POC. This
causes many problems. Check out this slide from a talk on SQL Server at VMworld Europe 2009:
The slide points out a couple of things. First, the larger caches and shorter pipelines on newer Intel processors results in a considerable drops in performance overheads. Second, the latency of the VMEXIT instruction, which determines the amount of time it takes to transition from the VM to the VMkernel, has shrunk by a large amount with subsequent generations of hardware. And don't forget the other additions from Intel and AMD such as hardware assisted memory management and IO virtualization.
Item 3: Follow SQL Server Best Practices
Microsoft has kindly provided a web page of best practices for SQL Storage configuration. These be practices should still be followed when configuring your virtual SQL deployments!
Item 4: Configure VM Identically to Native and Run The Right Test
For many SQL Server POCs the goal is to measure the VM's ability to perform, with respect to the virtual platform. If this comparison is to be performed, its critical that the VM be configured identically to the physical hardware. Obviously this means that the VM should be run on the same hardware using identically configured LUNs. Its also important to ensure that the VM has the same number of vCPUs and amount of memory as the physical baseline. This means restricting the number of pCPUs and amount of memory with NUMPROC and MAXMEM, respectively, in boot.ini.
It also means that the test being applied should be understood. If a benchmark is chosen that uses a very small database, the content will be cached and the storage system won't be used. This can skew the results and produce recommendations not consistent with production deployments. Here is another slide from the same VMworld Europe 2009 presentation detailing some of what we know about the SQL Server benchmarking alternatives:
We at VMware prefer DVD Store.
Item 5: Use VMware's ESX Server
VMware's hosting products, VMware Server, VMware Workstation, and even VMware Fusion, are all capable of running SQL Server. But if the database is going to be run in production on enterprise-class hardware, use VMware's enterprise-class hypervisor: ESX Server. These products are not often confused by the initiated but rogue members of large companies often run off-the-books proof-of-concept experiments on VMware's hosted products. When they produce results they don't like, the results get spread throughout the company which can slow the virtual deployment.
Consider the following data, again from the VMworld Europe 2009 SQL Server presentation:
This information is getting a bit dated now, as it was performed years ago on ESX Server 3.0. But the point stands: before believing results claiming that "VMware cannot run SQL Server" its worth investigating the platform used to generate the results.
Item 6: Understand Memory Management and Configure Correctly
Database performance is heavily dependent on the amount of memory available. Almost without exception, providing more memory to SQL Server will improve performance. However, if that memory is coming from a host that is already over-committed or is being provided through workarounds to 32-bit limitations, performance may suffer. Here are a few keys for SQL Server memory management:
- If more than 3 GB is desired, use 64-bit versions of the OS and application.
- If memory is over-committed on the box, set reservations for performance-critical SQL Server VMs to guarantee that those VMs' memory isn't ballooned or swapped out.
- If SQL Server's "lock pages in memory" parameter has been set, provide set the VM's reservations to the amount of memory in the VM. This setting can adversely interfere with ESX Server's balloon driver. Setting reservations will stop the balloon driver from inflating into the VM's memory space.
Item 7: Align Disk Partitions
This item is really a special but very important case of item two, follow best practices. Partition alignment can impact storage performance which can be critical to some SQL Server VMs' performance. See VMware's paper on partition alignment for more information on this.