Skip navigation
2009

If any of you have heard me speak in the numerous events I've done in the past two years, you may have heard me detail the areas where virtualization performance can exceed native.  There are scalability limitations in traditional software that make nearly every enterprise application fall short of utilizing the cores that are available to them today.  As the core explosion continues, this under-utilization of processors will worsen.  Here is a graph that we've been showing to illustrate that point:

 

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5369/core_explosion.png 

 

In 2008 I visited VMworld Europe and showed on using multiple virtual machines on a single physical host could circumvent the limitations in today's software.  In that experiment we showed that 16,000 Exchange mailboxes could be fit on a single physical server when no one had ever put more than 8,000 on in a single native instance.  We called this approach designing by "building blocks" and were confident that as the core count continued to increase, we'd continue to expose more applications whose performance could be improved through virtualization.  

 

On Thursday last week SPEC accepted VMware's submission of a SPECweb2005 result.  And last night we posted an article on VROOM! detailing the experiment and providing information on the submission.  This submission is an incredible first for us: not only have we shown that we can circumvent limitations in web servers, but we posted a world record performance number in the process.  Of course, if any of you have seen Sreekanth Setty's presentation at VMworld on his ongoing work on SPECweb2005, this result wouldn't surprise you:

 

http://communities.vmware.com/servlet/JiveServlet/downloadImage/5370/specweb_scaling.png

 

Getting a benchmark standardization body like SPEC to approve these results isn't always easy.  Most of the industry remains stuck in a mode of thinking of performance as a single instance's maximum throughput.  But given the scale-out capabilities of a large number of enterprise applications I'd argue that benchmarking should account for scale-out capabilities on a single box.  VMware's customers follow this practice faithfully in sizing their deployments to match their needs and everyone wants to know the platform's ability to handle this use-case.  SPEC's willingness to accept results showing building blocks on a single host is commendable and progressive.  As more benchmarks approve submissions like these VMware will continue to be able to show record numbers.

For years now VMware has been providing products that enable a virtual desktop experience.  Historically, this would occur in virtual desktops on our hosted products but in some cases virtualization of Citrix XenApp (formerly presentation server) could provide a large number of desktops off a single virtual machine.  And more recently VMware View offers a means of hosting a large number of desktops on a single server where each is granted its own operating system instance.  As the number of virtual desktops and alternatives for implementing virtual desktops has grown, the need for a benchmark that can compare the performance of these alternatives has arisen.

 

Desktop benchmarking is not new to the industry, as people have been using PCs for decades.  But standards in virtual desktop benchmarking are non-existent.  Some might argue that traditional tools, common to PCs for years, should be used.  But there are several reasons why this is not true:

  1. Pre-virtual desktop benchmarking is built to completely saturate all memory and CPU resources provided.  Fully saturating CPU on a single multi-way VM, as an example, results in far fewer VMs per host than is common in VDI deployments.  Fewer VMs means less work for the hypervisor's scheduler.

  2. Existing desktop benchmarks are often throughput-based, as opposed to latency-based.  Because existing tools want to differentiate between powerful processors and large amounts of memory, they're designed to pack more and longer instructions in each run than is common in virtual desktop deployments.  Most desktop deployments won't run massive video renders but the response times of individual button clicks and window appearance is critical.

  3. No existing benchmarks are aware of the peculiarities of VM-based timing.  VDI benchmarks need to be aware of this by either using host timing or invoking and measuring operations from remote, non-virtual locations.

At VMworld 2008 VMware presented a VDI workload that had been constructed from a collaborative effort between all VDI teams within VMware with review and qualification by several of our partners.  The first measurements on this workload came from Dell and EqualLogic and we quickly made details on its characteristics available via white paper.  Key features of this workload include:

 

  • A diverse set of applications (Word, PowerPoint, Excel, Acrobat, and Internet Explorer) common to business desktop deployments.

  • Load generation modeled after the most common VDI deployments.

  • Small (less than 500 ms) operation generation and measurement.

  • Host-based measurement and an architecture to support remote command invocation in the next release.

 

As an attempt at the world's first VDI benchmark, we're very pleased with our efforts.  We found that it met the unique requirements in measuring virtual desktops of all kinds.  And since it was generated with large group of internal collaborators and multiple partners, it's an excellent beginning at what the industry needs to standardize this process.

 

But today we realize that its just a beginning.  I want to encourage everyone to bring your comments to VMware via this blog or the performance forums on what you think the characteristics of a industry standard virtual desktop benchmark should be.  We'll never make one benchmark that meets everyone's needs and I suspect that there are even some common needs that will require significant development resources.  But I expect that with your guidance and assistance in refining this workload we'll accelerate the process of getting this benchmark in a shape that the industry can embrace