VMware Communities > Blogs > Manual Automation > 2008

Blog Posts

Manual Automation : November 2008

Previous Next
0

In a previous blog post, Virtualizing Virtual Center, I discussed the benefits of virtualizing Virtual Center and why I done this in my environment. I recently heard another argument against this decision from my VMware SE. The argument is that if all of the ESX hosts in the cluster where the Virtual Center VM is in crashes, then you have to logon to each host to find where it’s at to power it back up. The recommendation is to then configure DRS such that it keeps the VM on the same host so you know where it’s at. Let’s consider this a little further…

For those environments that have multiple ESX hosts and are using HA, the VC VM should be powered-on to another host if there are enough hosts left standing. So this argument really only applies to a scenario where all hosts have crashed, in which case you may have a bigger problem on your hands anyway(!).

But let’s say this does happen. How can I find out where any particular VM was when my DRS/HA cluster crashed? Well besides logging on to every host you could just query the Virtual Center database. Here’s a quick little query that will give you a list of VMs and the host they’re currently on.

SELECT VPX_VM.DNS_NAME AS
VirtualMachine, VPX_HOST.DNS_NAME AS Host
FROM VPX_VM INNER JOIN
VPX_HOST ON VPX_VM.HOST_ID = VPX_HOST.ID

You could set this up as a scheduled task and save the results to a text file (or better yet, a SharePoint server if you’re organization uses that for document sharing/management – this is a little further down my task list). Of course you should save this information on a system “outside” your virtual infrastructure such as a NAS-based CIFS share.

I’m sure you could do something similar with PowerShell and the “get-vm” command but I haven’t really looked into it. There are other tools that can help you track VMs as well such as Veeam Reporter.

The bottom line is that if you remove your VC VM from DRS it will not enjoy the load-balancing benefits that DRS brings to the table in the first place. I’m not running the Distributed Power Management (DPM) feature but I’d also have to wonder how this might impact environments where this feature is enabled.

As with many technical decisions, I think this is largely a matter of personal preference and what you’re comfort zone will allow. We VI admins have notoriously large comfort zones so I’m guessing many are virtualizing their Virtual Center instances! If you’re okay with downsides previously mentioned, by all means assign the VM to a specific host. I’ve lived through enough hardware and power-outages that crashed my VI over the years so I’ve learned that hard way: track your VMs regularly regardless of how you’ve implemented Virtual Center. You’ll thank me for it someday.

0 Comments 0 References Permalink
0

This thing must be beta! It inserts additional lines, doesn't convert Word-formatted documents very well, etc.

I hope they keep improving this resource.

0 Comments 0 References Permalink
4

To give you a little background, I now have 6 ESX hosts with 58 VMs. Each host has dual-iSCSI HBAs with 1GbE connections. All Exchange 2007 roles have been virtualized, however we currently only have 1 out of 5 mailbox servers running as a virtual machine. We have a number of other workload types virtualized including file, print, SQL, web servers, etc.

Management has decided to stop virtualizing Exchange servers. Why? Fear generated by the FUD that surrounds the performance characteristics of various storage transports - in this case iSCSI via GbE. The only way to fight FUD is with facts. Towards this effort I have performed some calculations in an attempt to answer 2 questions:

1. How well is our storage transport performing given current virtualized workloads?

2. How much "performance capacity" do we have remaining?


I added up the average bandwidth utilization of all 6 of my ESX hosts which totaled 11008KBps. This converts to 0.09Gbps out of 2Gbps or 4.5% bandwidth utilization. I then added up the maximum utilization of all 6 ESX hosts. This would be the high-point of the peaks or bursts in utilization. The result was 0.48Gbps.


Assuming we can get 800Mbs of actual bandwidth per connection we have 1.6Gbps useable bandwidth remaining. Note that based on VMware's testing we should be able to reach near wire-speed (2Gbps) if the environment is configured correctly making 1.6Gbps a conservative assumption.


So even if I use the maximum bandwidth measurement of 0.48Gbps, that leaves 1.1Gbs useable. Another way to state it is that my environment is reaching a max of 30% bandwidth utilization.


The results seemed unbelievable to me at first so I digged a little deeper:


  1. I found this in a EqualLogic presentation from 2005: "With 2 iSCSI connections and free NIC teaming, payload equals approx. 234 MB/s (1.96Gb/s) or 823GB/Hour. We found 2Gb FC delivers 196 MB/s which equals approx. 689GB/Hour payload." http://communities.vmware.com/servlet/JiveServlet/downloadBody/1806-102-1-1554/VMUG.ppt
  2. I found this in an iSCSI Virtualization whitepaper from 2007: "For high-performance, mission-critical servers, the cost of Fibre Channel is often justified, because Fibre Channel provides higher bandwidth (4 Gbps vs. 1 Gbps) and lower latency than IP networks. However, many environments are over-served by 4Gbps Fibre Channel links. This is particularly true for hosts running applications characterized by random traffic, such as database applications and Exchange."
    http://www.dell.com/downloads/global/products/pvaul/en/iscsi_virtualization.pdf
  3. And here's one from Netapp: "...based on deployments, Netapp has proven over the past 3 years that a scalable, simple to use array with enterprise class reliability can safely be the iSCSI platform for mission-critical applications. Exchange is a perfect
    example of a mission critical application that is routinely deployed over iSCSI these days."
    http://storagefoo.blogspot.com/2006/05/iscsi-performance-and-deployment.html
  4. Finally, VMware's own testing of storage protocols and their corresponding physical medium from this year: "This paper demonstrates that the four network storage connection options available to ESX Server are all capable of reaching a level of performance limited only by the media and storage devices."
    http://www.vmware.com/files/pdf/storage_protocol_perf.pdf
It's important to note that I'm leaving 2 things out of this consideration:
1. I typically read how FC has lower latency than IP. My somewhat empirical belief is that IP's additional latency will not be a big factor when added to the equation.
2. I've read different sources that state disk IOPS are more important with regards to system performance than storage transport bandwidth utilization.
I'm still looking for a way to quantify these factors to better predict the performance characteristics of our IP storage implementation. This is the first part of what I'm sure will be an on-going investigation. It sure would be nice to have a tool that did all of this for me! I have yet to find something that's comprehensive enough on any given storage platform I've managed (IBM DS, EMC Celerra, et al).

Also note that I've been monitoring my bandwidth utilization more closely using Vkernel's Capacity Analyzer and can safely say that 11008KBps is high. It's dropped 30-40% over the last two months for various reasons.


Next month I hope to enable jumbo frames in this environment and expect to see some additional performance gain at some level. I'm considering capturing before/after snapshots of various performance metrics and posting the results in a future blog.


In conclusion, this analysis makes me even more confident about the performance of our ESX hosts and virtual infrastructure backend storage transport even if/when I get to virtualize the remaining Exchange mailbox servers.

4 Comments 0 References Permalink