Virtual Reality Check benchmarks (TS)

JonSnow_Stark · ‎02-17-2010

Hi,

I read the last benchmark realised by the project Virtual Reality Check on the Terminal Services workload on different platforms : vSphere, Xenerver 5.5 and Hyper-v.

http://www.projectvrc.nl/

I attached the conclusion of the whitpaper.

I am very surprise by the results, could you explain it ? What is the VMware position about theses results ?

Thanks,

cmacmillan · ‎02-28-2010

Read the report. Interesting. Not a lot of detail given on the test setup. vSphere, Hyper-V and Xen are not "black boxes" and the configuration of the virtual machines, network interfaces and storage interfaces WILL play a role in the outcome - especially where pushing the resources to the maximum is your test objective. Therefore, it would be helpful to include an appendix with the architecture of each platform to include the omitted details.

For instance, which VMware virtual storage driver was used for application data (LSI SCSI, LSI SAS or pvscsi)? And which virtual NIC (e1000 or vmxnet3)? How was networking configured? These are all important factors in comparing your results (analysis) and tuning the respective environments for platform performance. For example, there's a reason why most top VMmark posts use vmxnet3 opposed to e1000 for their vnics (and manages TSO settings)..

As for the tripple channel issue, it brings-up another issue of NUMA management. Certainly, a performance test should have the hardware tuned and configured (down to proper BIOS settings) including RAM build-out. In Nehalem and Opteron configurations, balancing memory banks is important as well (equal size/banks per processor node) as NUMA scheduling will be used by default. It is assumed by the discussion of 2-channel vs. 3-channel that the configurations were 4GB DIMM by 2-DPC x 3-C x 2-N for the 48GB configuration (IMC falls to DDR3-1066 for 2-DPC) and 4GB DIMM by (3-DPC x 2-C + 2-DPC x 1-C) x 2-N for 64GB configuration (IMC falls to DDR3-800 for 3-DPC). Moving from "single channel DDR3-800" to "tripple-channel DDR3-1066" should have registered a larger impact - or at least a measureable impact across the board - which means the 8VM x 2vCPU configuration may have been bound by another resource other than memory bandwidth..

As for CPU tuning, the Barcelona series Opteron is not a great representative of "modern" AMD gear as Shanghai and later series provide better functioning RVI implementations. At least in VMware's case, significant guidence has been given to use RVI/EPT for any Shanghai (or newer) Opteron and Nehalem-EP/EX processor. However, while it is clear that EPT/RVI was enabled, it was not clear (i.e. nothing was documented) if you configured the Windows systems to support larget memory pages as well. A 5-10% increase in performance could be expected under vSphere with RVI/EPT and large memory pages in the guest OS.

Judging from the Phase I/II documents, shared storage was not a part of the test platform: a terrible mistake IMHO as local storage is not the norm in enterprise virtualization. In fact, using direct attached storage ignores a significant component of virtualization platform differentiation. It's not enough to invalidate the tests, but significant enough to limit their applicability (e.g. you cannot extrapolate direct attached perforamance to shared/network storage performance)..

Still, it's good raw information, but - as vSphere provides great performance monitoring capabilities - it would be more informative to provide information about where the bottlenecks were encountered: storage, processor, scheduling, etc.

--Collin C. MacMillan

SOLORI - Solution Oriented, LLC

If you find this information useful, please award points for "correct" or "helpful".

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".

MKguy · ‎03-10-2010

So, as a short synopsis of the VRC report: (I encourage everyone to read the paper themselves.)

- test platform was a 2 socket Nehalem DL380G6

- on 4 VMs@2vCPU without Hyper-Threading enabled, ESX4 outperforms XenServer5.5 and Hyper-V2.0 by 5%

- on 8 VMs@2vCPU with Hyper-Threading enabled, ESX4 gets outperformed by XenServer5.5 and Hyper-V2.0 by 15%

This really got me quite confused. It seems, however, that there apparantly . Performance guru Scott Drummonds and VMware are already on it:

http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/#comment-438

Yeah, I am aware of that work. We researched the reason for the disappointing results and discovered something interesting about our scheduler. I wrote up a summary that was distributed internally. I think that I will share those comments on this blog.
The interesting thing about those VRC results is that ESX did not benefit from HT the way it should. HT did not slow things down but nor did it provide value (on vSphere).
More to come…
Scott

I hope we will hear from them soon.

-- http://alpacapowered.wordpress.com

All

Virtual Reality Check benchmarks (TS)