To give you a little background, I now have 6 ESX hosts with 58 VMs. Each host has dual-iSCSI HBAs with 1GbE connections. All Exchange 2007 roles have been virtualized, however we currently only have 1 out of 5 mailbox servers running as a virtual machine. We have a number of other workload types virtualized including file, print, SQL, web servers, etc.
Management has decided to stop virtualizing Exchange servers. Why? Fear generated by the FUD that surrounds the performance characteristics of various storage transports - in this case iSCSI via GbE. The only way to fight FUD is with facts. Towards this effort I have performed some calculations in an attempt to answer 2 questions:
1. How well is our storage transport performing given current virtualized workloads?
2. How much "performance capacity" do we have remaining?
I added up the average bandwidth utilization of all 6 of my ESX hosts which totaled 11008KBps. This converts to 0.09Gbps out of 2Gbps or 4.5% bandwidth utilization. I then added up the maximum utilization of all 6 ESX hosts. This would be the high-point of the peaks or bursts in utilization. The result was 0.48Gbps.
Assuming we can get 800Mbs of actual bandwidth per connection we have 1.6Gbps useable bandwidth remaining. Note that based on VMware's testing we should be able to reach near wire-speed (2Gbps) if the environment is configured correctly making 1.6Gbps a conservative assumption.
So even if I use the maximum bandwidth measurement of 0.48Gbps, that leaves 1.1Gbs useable. Another way to state it is that my environment is reaching a max of 30% bandwidth utilization.
The results seemed unbelievable to me at first so I digged a little deeper:
- I found this in a EqualLogic presentation from 2005: "With 2 iSCSI connections and free NIC teaming, payload equals approx. 234 MB/s (1.96Gb/s) or 823GB/Hour. We found 2Gb FC delivers 196 MB/s which equals approx. 689GB/Hour payload." http://communities.vmware.com/servlet/JiveServlet/downloadBody/1806-102-1-1554/VMUG.ppt
- I found this in an iSCSI Virtualization whitepaper from 2007: "For high-performance, mission-critical servers, the cost of Fibre Channel is often justified, because Fibre Channel provides higher bandwidth (4 Gbps vs. 1 Gbps) and lower latency than IP networks. However, many environments are over-served by 4Gbps Fibre Channel links. This is particularly true for hosts running applications characterized by random traffic, such as database applications and Exchange."
http://www.dell.com/downloads/global/products/pvaul/en/iscsi_virtualization.pdf - And here's one from Netapp: "...based on deployments, Netapp has proven over the past 3 years that a scalable, simple to use array with enterprise class reliability can safely be the iSCSI platform for mission-critical applications. Exchange is a perfect
example of a mission critical application that is routinely deployed over iSCSI these days."
http://storagefoo.blogspot.com/2006/05/iscsi-performance-and-deployment.html - Finally, VMware's own testing of storage protocols and their corresponding physical medium from this year: "This paper demonstrates that the four network storage connection options available to ESX Server are all capable of reaching a level of performance limited only by the media and storage devices."
http://www.vmware.com/files/pdf/storage_protocol_perf.pdf
1. I typically read how FC has lower latency than IP. My somewhat empirical belief is that IP's additional latency will not be a big factor when added to the equation.I'm still looking for a way to quantify these factors to better predict the performance characteristics of our IP storage implementation. This is the first part of what I'm sure will be an on-going investigation. It sure would be nice to have a tool that did all of this for me! I have yet to find something that's comprehensive enough on any given storage platform I've managed (IBM DS, EMC Celerra, et al).
2. I've read different sources that state disk IOPS are more important with regards to system performance than storage transport bandwidth utilization.
Also note that I've been monitoring my bandwidth utilization more closely using Vkernel's Capacity Analyzer and can safely say that 11008KBps is high. It's dropped 30-40% over the last two months for various reasons.
Next month I hope to enable jumbo frames in this environment and expect to see some additional performance gain at some level. I'm considering capturing before/after snapshots of various performance metrics and posting the results in a future blog.
In conclusion, this analysis makes me even more confident about the performance of our ESX hosts and virtual infrastructure backend storage transport even if/when I get to virtualize the remaining Exchange mailbox servers.