VMware

Manual Automation

A VMware, EMC and HP customer's experiences.

22 Posts 1 2 Previous Next
0

After reviewing the planning resources in the previous article I've identified some additional concerns/considerations that need to be addressed:

Hardware: BIOS/Firmware Updates

Since we're going to have the servers down anyway, we sould update the system BIOS and firmware of the system components as necessary. For my QLogic iSCS HBAs, while they were on VMwares I/O HCL, it was unclear what BIOS/firmware rev they should have. A quick email to support and the response was BIOS 1.15 and firmware rev "53". I was at 1.14/"49" so I needed to update all of my hosts.

Platform: ESX or ESXi?

If you've been following, you know I've been pretty hard on the ESXi/USB combination in previous articles. I experienced a problem with ESXi, iSCSI HBAs and SRM recovery plan tests causing random reboots of my ESXi hosts. This has been fixed since then. Also, rumor has it that the next major release will be ESXi-only. VMware is certainly steering customers in this direction now. I beleive this is really a now-or-later decision. And to add to the good ESXi news, HP has much better support for CIM providers with vSphere 4.0 in the form up an update package that can be installed on each host.

As for the USB keys, I won't be going back to those any time soon if I can help it. Having local disks on the host gives us another benefit with ESX 4 - it can now use these for the scratchconfig files as needed for the HA feature. See Duncan's post here: http://www.yellow-bricks.com/2009/12/03/esxi-lessons-learned-part-1 So ESXi embedded may work fine for you, but if I go back to ESXi, it will be the ESXi installable route. Another task to add to my upgrade plan.

Platform: Upgrade or Clean Install?

My experience with ESX upgrades goes back to my first ESX 1.5 server. I don't think there was an upgrade for ESX 1.x to 2.0 and 2.x to 3.0 was a nightmare. I had limited success upgrading ESX hosts and then once they were upgraded, you had to upgrade VFMS in a staged/controlled fashion. It wasn't pretty and my guess is that most admins performed a clean install. The good news ist that upgrading ESX 3 to 4 is a much better experience. This seems to be due to VMware doing a better job this time around (not that they've had some more experience?) and that VFMS doesn't require an upgrade. You can upgrade VMFS but it is a minor point update and it doesn't sound like it buys you much based on what I heard listening to the VMware Communiuies podcast on vSphere 4.0.

However, I just can seem to escape the fact that a clean install provide a good, well-known installation from which to start fresh. If I switch back from ESX to ESXi the point will be mute - I'll have to do a clean install. I'll get a better idea of this as I progress thru the test plan.

vCenter 32 or 64bit?

Decisions, decisions! Another bit floating out there in rumor-land is that the next release of vCenter will be 64bit-only. I think this one makes more sense since most development is moving in this direction if it hasn't already (a.k.a. Windows Server 2008 R2). VMware does support vCenter running on a 64bit operating using a 32bit DSN. There are instructions on how to do this that can be found in the vShpere Upgrade Center (see links in previous article).

And while we're at it, why include a little more future-proofing and do a clean install of vCenter on Windows Server 2008? Please review VMware's vSphere compatibility matrix as there are some distictions when considering the OS in either R1 or R2 flavors. In my evironment, we don't use Update Manager for scannig or patching VMs, so Windows Server 2008 Standard R1 64bit will work quite nicely. The vSphere components you're using in your environment will largely drive your OS choice.

0 Comments Permalink
0

Planning is the most important step of any upgrade project and this one is no exception. Luckily VMware provides a very good resources to help with this step. The first and most important stop is the VMware vSphere Upgrade Center: http://www.vmware.com/products/vsphere/upgrade-center/

Many of the following links can be found in the Upgrade Center. I recommend you watch VMware's "How to Upgrade to vSphere" webcast:
http://www.vmware.com/a/webcasts/details/248


And watch VMware's "How Do I Upgrade" video series:
http://download3.vmware.com/vsphere/vsphere-migration-intro.html
http://download3.vmware.com/vsphere/vsphere-migration-part1.html
http://download3.vmware.com/vsphere/vsphere-migration-part2.html
http://download3.vmware.com/vsphere/vsphere-migration-part3.html
http://download3.vmware.com/vsphere/vsphere-migration-part4.html


The Upgrade Center also hosts the VMware vSphere Upgrade Advisor tool. Run this to determine next steps:
http://www.vmware.com/products/vsphere/upgrade-center/advisor/


Per advisor tool:

  1. Check VMware Infrastructure Licenses
    1. vSphere 4.0 (vCenter/ESX)
    2. Site Recovery Manager 4.0
  2. Review Support Contracts
    1. Make sure the proper PLA has been identified and setup
  3. Check Hardware Compatibility (use VMware Certified Compatibility Guides portal here: http://www.vmware.com/resources/compatibility/search.php)
    1. Systems: HP ProLiant DL380 G5, in my case
    2. SAN: EMC Celerra, iSCSI
    3. iSCSI HBA: QLogic QLE4062c
    4. Requires firmware upgrade
    5. NIC: Intel Pro/1000 PT Quad Port GbE
  4. Review Guest OS Support
  5. Review the Migration Pre-requisite Checklist

Some of the items in the checklist will be redundant but it doesn't hurt to be thorough given what we're trying to accomplish. After reviewing the previous resources, you will have a strong understanding of what the upgrade for your environment is going to require and should have drafted up at least a draft of your upgrade plan.

With plan in hand, it's time to test!

0 Comments Permalink
0

vSphere 4.0 Upgrade Project

Posted by Virtual_JTW Dec 10, 2009

In the next six posts I will describe the process I'm using to upgrade my Virtual Infrastructure 3.5 production cluster to vSphere 4.0. BTW, am I the only one that still dislike the "vSphere" moniker? What was wrong with "Virtual Infrastructure"? At least that is more descriptive! What does a vSphere describe? Okay, I understand the "v" in front is short for virtual (maybe?). So is this a virtual sphere? The marketing folk need to rethink their branding practices. Now if they come out with some coherent brand strategy that ties the whole "sphere" thing together with their products, I'll have to take everything back, but as it stands it's worthless. But I digress...

I will publish everything in six parts:
Part 1: Planning
Part 2: Testing
Part 3: Upgrade to vCenter 4.0 and SRM 4.0
Part 4: Upgrade ESX Hosts
Part 5: Upgrade Virtual Machines
Part 6: Conclusion

0 Comments Permalink
0

VMware has moved links to this document so it took me a few minutes to find it but I eventually did find it here:
http://www.vmware.com/pdf/srm_compat_matrix_4_0.pdf

Note that the link can be found in SRM's documentation page here:
http://www.vmware.com/support/pubs/srm_pubs.html

I'm betting this will be the go-to place for all SRM documents moving forward.

0 Comments Permalink
0

I gave a presentation at VMworld 2009 on my SRM implementation. Part of this presentation included a discussion on WAN requirements and, more specifically, how to determine the amount of bandwidth you're going to need to replicate virtual machines between sites.

I have decided to wrap-up the main points of this discussion in this article for those that missed the session and for future reference in general.

I've summarized the process in three steps:
Step 1: Determine the size of data that needs to be replicated
Step 2: Determine the change rate
Step 3: Crunch the numbers

Determine the Size of Data that Needs to be Replicated
For SRM, you're going to group protected VMs on the same LUN or LUNs. View your datastores and add up all of the disk in use (i.e. utilized by VMDKs, VMXs, logs, etc). This will be the total size of data that needs to be replicated

Note that the first replication is what I call the "seed" copy as it will replicate all of the data. Most modern replication technologies will replicate only the changes or deltas from that point forward. Because this can be quite a sizable amount of data sometimes adding up to a terabyte or more, I highly recommend replicating over a LAN connection first. I had the good fortune of having both of my SAN nodes in the same location for several weeks. This allowed me to setup replication between the two and perform that first seed copy while connected via high-speed LAN links. If you don't have this luxury, some SAN vendors provide a facility where by you can use alternate media such as tape to perform the first copy. Either way, please keep this in mind as it could dramatically impact the success of your efforts (i.e. replication could consume large amounts of bandwidth causing production applications to fail, it could run for a long time causing doubt that it will ever finish successfully, etc, etc).

Determine the Change Rate
This will be the hardest part for some storage administrators. You'll typically want to determine both daily and peak change rates. There are several different ways of determining your change rate. Here are a few examples:

  • Obtain from incremental backups
  • Utilize tools provided by your SAN vendor
    • If you haven't purchased your SAN yet, ask SAN vendors if they have tools that will help you manage and monitor bandwidth utilization
  • Utilize third-party WAN (or LAN) bandwidth monitoring tools
  • Perform several replications overs at least a full weeks time and measure the results

I used the last two methods to determine our daily and peak change rate. And after moving the second SAN unit to the remote site, we continued measuring the WAN bandwidth utilization over two weeks time.

Crunch the Numbers
Now that you're armed with some raw data, it's time to try to make sense of it all. The general formula we'll use to calculate the change rate is:
(GB change/day)/(GB total size) = % change rate

So, for example, if we had 56GB of data change in one day out of 617GB total, the change rate would be 56/617 = 9% per day. You'll find that a 9% change rate is not that bad. From this you could surmise that you'll need a fractional DS3 (about 6 DS1s).

Alternatively, you could use an "industry average" change rate of 20% and apply that to the total size of data you discovered in the first step. I have seen this figure used as a general guideline in documents from multiple SAN vendors over the years. Hopefully it's overkill for your needs but it's better than nothing. Using this rate on same same example:
20% * 617GB = 124GB/24hrs = 5.2GB/hr = about 12 DS1s

As you can see, this will cause you to provision roughly double the amount bandwidth that you really need.

Caveat Emptor
Please note that every environment is different. Consider these calculations to be estimates! These rates are never linear. For example, after all is said and done you may add several VMs to your replicated datastores dramatically increasing the total size. Consequently, I recommend revisiting your calculations on a regular basis and adjusting accordingly. Use a good network monitoring tool such as NetFlow Analyzer to help keep you on top of bandwidth utilization.

One final note, my WAN utilization is made up of several consumers - SAN replication, Active Directory replication, corporate applications such as warehouse inventory tracking, etc. Make sure you account for these other bandwidth needs and provision WAN links that meet your total requirements. We also added 30% to our total bandwidth requirements to allow for growth and room for some additional utilization that we knew we would need within the next year.

This certainly isn't the final word on the matter so please feel free to post your experiences on determining WAN bandwidth requirements.

0 Comments Permalink
2


As part of my vSphere 4.0 upgrade planning, I checked VMware's HCL for all of my hardware. I noticed that VMware's SAN HCL for vSphere 4 includes the Celerra NS350 for NAS but there's no listing for iSCSI unlike many other Celerra models that explicitly state "iSCSI" and "NAS".

Good News: I opened and SR with EMC support who verified that the Celerra NS350 is indeed supported as an iSCSI target for vSphere 4 just as it is for VI 3.5.

For future reference, one can verify this by using the e-Lab Navigator tool found on EMC PowerLink.


2 Comments Permalink
0


I don't normally post newsie type items or just links to other sites but I'm making somewhat of an exception in this case since I've written articles related to this topic before. Duncan Epping has posted an unofficial best practices for virtualizing vCenter instances running in a vSphere environment. Serious VI admins should be reading his blog on a regular basis already - highly recommended!

Check it out at http://www.yellow-bricks.com/2009/10/09/best-practices-running-vcenter-virtual-vsphere/


0 Comments Permalink
0

VMworld Session Available!

Posted by Virtual_JTW Sep 21, 2009

If you would like to hear about my SRM implementation, VMware has made the slides and audio available on their VMworld site here:

http://www.vmworld.com/docs/DOC-3480


We had a great Q&A session.


I counted about 288 chairs and it was standing-room only so we had a great turn-out.


If you're looking at deploying SRM in your environment or curious as to how other companies are implementing and using the product I highly recommend you check it out.

0 Comments Permalink
0

VMworld must be my destiny! Let me explain...

I've been to every VMworld since the first was held in 2004. I really thought this was the first year I was going to miss the conference. The odds just weren't in my favor: the economy busted right around last year's VMworld and since then my employer has entered Chapter 11 (but should come back out of it relatively quickly) and I can't afford to cover the costs myself. I tried to reassure myself by remembering the last conference held in San Francisco was my least favorite thus far (which didn't really work anyway).

Then in March, my local VMware Systems Engineer, Dave, asked if I wanted to present my Site Recovery Manager (SRM) experience at this year's conference. My first reaction was not only "no", but "hell no". After a little more thought, I quickly changed my mind. I think SRM is a great product that isn't getting the attention it deserves. I'm also proud of what our little team has accomplished here in such a short amount of time. And finally, I enjoy the occasional challenge and believe this helps one to keep growing professionally and keep life interesting.

In case you didn't know, VMware typically covers the cost of the conference for speakers. The only thing left was to convince management to cover the travel costs which was not easy given the current financial circumstances. But in the end I was given approval and just like that, I'm VMworld bound again this year!

If you're considering SRM or are just getting started, check out my session "BC2704: Site Recovery Manager, a real user experience". I promise it will be worth your time. I can talk about this stuff for hours and Dave and I will answer all of your questions - from technical SRM product to general disaster recovery and everything in-between.

Here's the abstract:

"Learn from a customer in the Midwest, all of their experiences implementing, testing and running Site Recovery Manager in a production environment. Hear their challenges and how their SRM implementation has worked for them. Find the facts you need to know to maximize the success of your disaster recovery solution with SRM."

See you there!

0 Comments Permalink
3

Bye, Bye ESXi

Posted by Virtual_JTW May 20, 2009

What a long, frustrating trip it's been! Don't get me wrong, I really like the idea of ESXi: thin, fast install, small foot-print, BIOS-like host configuration, no Console OS (COS) to patch or support, can run from embedded USB key, etc, etc. But, my experience in supporting and managing an ESXi-based VI production environment tells a different story.

I've decided to convert all of my hosts from ESXi to ESX "Classic". There are three primary reasons:

  1. Support
  2. Reliability
  3. Compatibility

Support

Without the COS it's difficult to execute commands and view logs files "real-time". I've had more than one VMware support engineer complain about this during a trouble-shooting session (so it must be true!). There are alternatives: using the unsupported trick to get to the command line from the host's console, hacking SSH to open it up (which is also unsupported), capturing logs/diagnostic bundles via vCenter Server, RCLI, VIMA, etc. But none of the alternatives are as fast/clean/easy as SSH'ing right into the COS and working from there.

Reliability

I purchased eleven Hewlett-Packard USB keys w/unlicensed ESXi to embed in my ProLiant DL380 G5s. When I upgraded them via VMware Update Manger (VUM), the entire installation on the key became corrupted. I was not even able to revert back to the previous ESXi image on some keys. HP has since issued a customer advisory and I have replaced all of the keys: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01605187

Unfortunately, this experience still leaves me with a less-than-fuzzy feeling for running ESXi on said embedded USB keys in critical production environments.

Compatibility

ESXi seems to lag ESX Classic in updates - specifically when it comes to compatibility. This implies that ESX Classic is developed, tested and certified against first. I manage two VI environments located in different datacenters. I use SRM at the primary for DR/fail-over to the secondary (see previous articles). It took around a month for VMware to release a patch for ESXi Update 3 that made it compatible with SRM 1.0 Update 1. Read your compatibility guides! More on this in a future article.

Many third-party tools and scripts require the COS. There are many examples of this: Snap Hunter, Vizioncore vOptimizer Pro, etc.

Unlike ESX Classic, HA in ESXi requires a ScratchConfig folder created on separate VMFS datastores for each host. This may not be a big deal for smaller clusters, but for a cluster with many servers, many datastores will be required.

Finally, for those of us that run HP servers to host ESXi, we have a specific firmware/ISO that contains the HP management providers. Unfortunately, even with the built-in providers you still can't monitor disk status - which is, of course, the one hardware component that fails the most often(!). As of this writing, here's what I've been ale to determine:

Licensed ESXi

  1. HP only supports ESXi with the proper "management providers" on Update 2 and Update 3 (as evidenced by VMware's downloads section of their web site). ESXi 3.5 Update 4 is not yet supported.
  2. Using VMware Update Manger to upgrade ESXi instances to Update 4 effectively breaks SIM manageability.

Free ESXi

  1. A new installable image is available for Update 4 with the management providers.
  2. Upgrading exiting hosts via the VMware Infrastructure Update tool effectively breaks SIM manageability. These hosts will have to be reinstalled from the ISO.

Once you're sure you have the right ESXi firmware image installed, it's time to add the host to HP Systems Insight Manager (SIM) for hardware monitoring. I was able to add only 3 out of about 20 of my ESXi hosts successfully. HP support wasn't able to help me out. The main suggestions I got were to reinstall(?) and to call VMware. With ESX Classic you install the Insight agent in the COS, add the host to SIM, and you're done. It just works.

Conclusion

Like I mentioned previously, I still really like the idea of ESXi. Once I saw that Hitachi was embedding a virtualization solution in their servers I knew it was only a matter of time before VMware came out with something similar.

I have many free ESXi installable instances. This is a great solution in cases where the budget it tight or non-existent. Utilizing the free ESXi still gives you many of the benefits of virtualization making it a better way to go than bare metal OS installation in most cases.

I think embedded/thin is the future. I hope vSphere 4 embedded improves on the issues described above.


5-20-2009 UPDATE:
Not being one to spread any FUD, I would to add to my comment "Using VMware Update Manger to upgrade ESXi instances to Update 4 effectively breaks SIM manageability". According to this article on the Yellow Bricks blog (http://www.yellow-bricks.com/2009/05/14/updating-an-esxi-server-with-vendor-agents/), VUM has the intelligence to download and apply the correct ESXi firmware image (i.e. the one with whatever OEMs management providers are pre-installed). And I believe it because Ducan is "the man". Note that I used the Virtual Infrastructure Update Tool, not VUM, so that may have made a difference. Regardless, I do know that after updating the image HP Insight Manager was no longer seeing all of the hardware components and in some cases failed to communicate to the server completely. This hasn't to me happened on my ESX "classic" hosts.

3 Comments Permalink
0

By the time I was twelve I knew I wanted to be in the IT business. I loved my Commodore 64, my 1541 disk drive and 300 baud modem! I loved dialing into bulletin board systems and typing in games with all of those peeks and pokes. I didn't know it at the time, but I was really learning a lot about computer hardware, storage media, telecommunications and software programming.

Responsibilities at my first corporate job included managing and maintaining LANtastic on a 10base2 network. Okay, I was mostly maintaining it - not very reliable as it ran on top of MS-DOS and the transceivers failed all of the time but hey, this was before the whole dot com boom era so you could expect much back then. I also had a Novell NetWare 3.x NOS running for one department (remember NetWare Loadable Modules, NLMs?). Once NetWare 4.11 came out I convinced management to dump LANtastic and 10baseT and run upgrade to NetWare 4.11 and Ethernet (remember NetWare Directory Services, NDS - can you say "AD"?). I didn't realize it at the time, but I was really learning a lot about network operating systems (NOS), networking hardware and standards.

It was around this time that I started to realize that I wanted to work with multiple technologies, especially newer technologies that were interesting because they solved a complex problem or saved businesses money. I didn't want to become a walking product manual for one piece of software or hardware component and focus on that for the rest of my life (or until I retired, which-ever came first)!

My next opportunity involved leading a team in the design, implementation and management of MS Systems Management Server (SMS) 2.0 to potentially hundreds of locations around the US. While the tie-in may not be oblivious, working with SMS allowed me to learn more about MS SQL Server 2000, Windows management technologies (CIM, WBEM, etc.) and data synchronization across slow links (anyone heard of Starburst?). Not-to-mention it was a great opportunity to work for a 3 billion dollar publicly traded company - a different culture than my previous employer to say the least.

Unfortunately, management made a short-sided, mostly political decision and shut down the SMS implementation after it was successfully deployed to the first remote site. However, this turned-out to be good news: I discovered virtualization around this same time - 2001/2002 timeframe via VMware Workstation. Shortly there-after I setup a production GSX server hosting three VMs that shared the same base image with each VM having a unique redo log (remember that VMware whitepaper?). The users couldn't tell the difference and were a bit surprised when I finally let them in on the secrest! And as they say: the rest is history.

To expand on a theme, virtualization has allowed me to work with and learn more about operating systems, enterprise-level server hardware (such as how CPUs work), enterprise-level storage and all of its related technologies (SAN hardware, iSCSI, fiber switches, etc).

I don't pretend to have the most unique career path in the world - I know there are many other Systems Administrators and Engineers that have lived similar experiences. And for that reason I'm identifying a new breed of IT professional. Systems Analyst, Systems Administrator and Systems Engineer titles are all less meaningful in this context. I've had all of these titles at some point in my career.

Regardless of title, we're IT professionals that stand apart based on our past experiences and constant passion to always be working with current and newer technologies that ultimately allow businesses operate faster, smarter and more efficiently. Virtualization, especially VMware Virtual Infrastructure and the coming vSphere products, has allowed us to take businesses to that next level.

And we won't stop there. We're constantly keeping an eye on cloud-based technologies, standards and initiatives. We'll be beta testing these products - and not just download, install, use for five minutes and throw-away. We're excited about the product and want to see it succeed so we'll provide feedback. We'll keep an eye on the bleeding edge and maybe sometimes participate in a limited way - such as a product install in a lab for evaluation, but we won't bet the company's business on it (at least those of us that have been around for awhile and have made that mistake before).

To bring us full circle I have to ask, what is the next NetWare? The next SMS? The next VMware VI? Is it more VMware? Quite possibly. Or maybe it's something we haven't thought of yet. Maybe its mainframe 2.0 with pervasive high-speed wireless connectivity brought about by a technology such as WiMAX? It's hard, if not impossible to predict. But one thing is certain, it will be "cool" and we'll be among the first of the light bulbs popping on throughout the IT industry and the businesses we work for will thanks us for it.

0 Comments Permalink
0

I just read Eric Siebert’s Open Letter to VMware which inspired me to write this article. I have to disagree with his last suggestion: relaxing VCP requirements. His argument is that many admins can’t afford to take the training which is a prerequisite to taking the exam. I have to wonder why? I know it is not cheap costing around $3000 but most IT professionals can come up with money. There are several ways to accomplish this:

1. Paid for by employer
2. As a bonus for hiring on with a different employer
3. Use savings: set aside as much as you can from each paycheck and pay for it yourself
4. Use debt: get a credit card/get a loan


I’m sure there are others, but these are methods I’ve used over the years for every certification exam I’ve completed. Obviously, the first choice is ideal. Consider this: if your employer won’t pay for the training and the exam (if you pass), doesn’t that say something about them? I consider it one of those life lessons I’ve learned along the way: ask about the support you’ll get to continue your education during the interview! My experience is that businesses that are managed well and/or have a more mature IT organization know that it’s in their best interest to encourage the life-long learning of their employees. If your current or potential employer has no desire to do this, that should throw a mental red flag you should carefully consider. Note that I do not recommend immediately quitting your job(!), but this is one more piece of information that should be considered within the whole.

I know many will shutter at the thought of going into debt to pay for a training course. But one must consider the full scope of the situation. Why do most IT professionals what to achieve certification? To have some acronyms listed after their names on a business card? No, it’s to convey value they bring to an organization based on their skill-sets. It may help hiring managers narrow down potential candidates to interview for a position. It may help Joe Admin get recognized by his peers and earn him a promotion. The point is that the training should be worth $3000 to the student in that it helps him further his goals at some point in the near future. Financially, it should also help him pay back the loan or pay off the credit card much faster than would be otherwise possible.

An administrator should only get the training and take the exam if it is required for them to accomplish their long-term goals. This at least implies that the exam should be no less easy to take just as not every admin needs or wants to take it.

Which brings me to my next point: I think the exam should be harder, not easier! I enjoy the exclusivity that passing the exam brings. With the “multiple answers” format, it’s too easy for dishonest people to publish “brain dumps” with exact questions and answers they saw on the test. It’s even worse that thugs sell PDFs with many of these questions to losers that use these exclusively to pass the exam. It really brings down the value of the certification for everyone. Ever heard of “paper MCSE”? I worked with one many years ago I can tell you the guy didn’t know how to do his job – even when I asked him to carry out the simplest of tasks. And there are many more of those folks out there today. What a waste.

I can’t say I have a solution to this problem. I dream of a day when testing methods become more robust in that they are better able to signify that the certificate holder knows their stuff. The VCDX may go a long way towards achieving this dream. In the meantime, I can highly recommend taking the course and the exam as the VCP is probably the hottest certification in the IT industry.

My VCP number is 001711.

0 Comments Permalink
2

In a previous blog post, Virtualizing Virtual Center, I discussed the benefits of virtualizing Virtual Center and why I done this in my environment. I recently heard another argument against this decision from my VMware SE. The argument is that if all of the ESX hosts in the cluster where the Virtual Center VM is in crashes, then you have to logon to each host to find where it’s at to power it back up. The recommendation is to then configure DRS such that it keeps the VM on the same host so you know where it’s at. Let’s consider this a little further…

For those environments that have multiple ESX hosts and are using HA, the VC VM should be powered-on to another host if there are enough hosts left standing. So this argument really only applies to a scenario where all hosts have crashed, in which case you may have a bigger problem on your hands anyway(!).

But let’s say this does happen. How can I find out where any particular VM was when my DRS/HA cluster crashed? Well besides logging on to every host you could just query the Virtual Center database. Here’s a quick little query that will give you a list of VMs and the host they’re currently on.

SELECT VPX_VM.DNS_NAME AS
VirtualMachine, VPX_HOST.DNS_NAME AS Host
FROM VPX_VM INNER JOIN
VPX_HOST ON VPX_VM.HOST_ID = VPX_HOST.ID

You could set this up as a scheduled task and save the results to a text file (or better yet, a SharePoint server if you’re organization uses that for document sharing/management – this is a little further down my task list). Of course you should save this information on a system “outside” your virtual infrastructure such as a NAS-based CIFS share.

I’m sure you could do something similar with PowerShell and the “get-vm” command but I haven’t really looked into it. There are other tools that can help you track VMs as well such as Veeam Reporter.

The bottom line is that if you remove your VC VM from DRS it will not enjoy the load-balancing benefits that DRS brings to the table in the first place. I’m not running the Distributed Power Management (DPM) feature but I’d also have to wonder how this might impact environments where this feature is enabled.

As with many technical decisions, I think this is largely a matter of personal preference and what you’re comfort zone will allow. We VI admins have notoriously large comfort zones so I’m guessing many are virtualizing their Virtual Center instances! If you’re okay with downsides previously mentioned, by all means assign the VM to a specific host. I’ve lived through enough hardware and power-outages that crashed my VI over the years so I’ve learned that hard way: track your VMs regularly regardless of how you’ve implemented Virtual Center. You’ll thank me for it someday.

2 Comments 0 References Permalink
0

This thing must be beta! It inserts additional lines, doesn't convert Word-formatted documents very well, etc.

I hope they keep improving this resource.

0 Comments 0 References Permalink
4

To give you a little background, I now have 6 ESX hosts with 58 VMs. Each host has dual-iSCSI HBAs with 1GbE connections. All Exchange 2007 roles have been virtualized, however we currently only have 1 out of 5 mailbox servers running as a virtual machine. We have a number of other workload types virtualized including file, print, SQL, web servers, etc.

Management has decided to stop virtualizing Exchange servers. Why? Fear generated by the FUD that surrounds the performance characteristics of various storage transports - in this case iSCSI via GbE. The only way to fight FUD is with facts. Towards this effort I have performed some calculations in an attempt to answer 2 questions:

1. How well is our storage transport performing given current virtualized workloads?
2. How much "performance capacity" do we have remaining?

I added up the average bandwidth utilization of all 6 of my ESX hosts which totaled 11008KBps. This converts to 0.09Gbps out of 2Gbps or 4.5% bandwidth utilization. I then added up the maximum utilization of all 6 ESX hosts. This would be the high-point of the peaks or bursts in utilization. The result was 0.48Gbps.

Assuming we can get 800Mbs of actual bandwidth per connection we have 1.6Gbps useable bandwidth remaining. Note that based on VMware's testing we should be able to reach near wire-speed (2Gbps) if the environment is configured correctly making 1.6Gbps a conservative assumption.

So even if I use the maximum bandwidth measurement of 0.48Gbps, that leaves 1.1Gbs useable. Another way to state it is that my environment is reaching a max of 30% bandwidth utilization.

The results seemed unbelievable to me at first so I digged a little deeper:

  1. I found this in a EqualLogic presentation from 2005: "With 2 iSCSI connections and free NIC teaming, payload equals approx. 234 MB/s (1.96Gb/s) or 823GB/Hour. We found 2Gb FC delivers 196 MB/s which equals approx. 689GB/Hour payload." http://communities.vmware.com/servlet/JiveServlet/downloadBody/1806-102-1-1554/VMUG.ppt
  2. I found this in an iSCSI Virtualization whitepaper from 2007: "For high-performance, mission-critical servers, the cost of Fibre Channel is often justified, because Fibre Channel provides higher bandwidth (4 Gbps vs. 1 Gbps) and lower latency than IP networks. However, many environments are over-served by 4Gbps Fibre Channel links. This is particularly true for hosts running applications characterized by random traffic, such as database applications and Exchange."
    http://www.dell.com/downloads/global/products/pvaul/en/iscsi_virtualization.pdf
  3. And here's one from Netapp: "...based on deployments, Netapp has proven over the past 3 years that a scalable, simple to use array with enterprise class reliability can safely be the iSCSI platform for mission-critical applications. Exchange is a perfect
    example of a mission critical application that is routinely deployed over iSCSI these days."
    http://storagefoo.blogspot.com/2006/05/iscsi-performance-and-deployment.html
  4. Finally, VMware's own testing of storage protocols and their corresponding physical medium from this year: "This paper demonstrates that the four network storage connection options available to ESX Server are all capable of reaching a level of performance limited only by the media and storage devices."
    http://www.vmware.com/files/pdf/storage_protocol_perf.pdf
It's important to note that I'm leaving 2 things out of this consideration:
1. I typically read how FC has lower latency than IP. My somewhat empirical belief is that IP's additional latency will not be a big factor when added to the equation.
2. I've read different sources that state disk IOPS are more important with regards to system performance than storage transport bandwidth utilization.
I'm still looking for a way to quantify these factors to better predict the performance characteristics of our IP storage implementation. This is the first part of what I'm sure will be an on-going investigation. It sure would be nice to have a tool that did all of this for me! I have yet to find something that's comprehensive enough on any given storage platform I've managed (IBM DS, EMC Celerra, et al).

Also note that I've been monitoring my bandwidth utilization more closely using Vkernel's Capacity Analyzer and can safely say that 11008KBps is high. It's dropped 30-40% over the last two months for various reasons.

Next month I hope to enable jumbo frames in this environment and expect to see some additional performance gain at some level. I'm considering capturing before/after snapshots of various performance metrics and posting the results in a future blog.

In conclusion, this analysis makes me even more confident about the performance of our ESX hosts and virtual infrastructure backend storage transport even if/when I get to virtualize the remaining Exchange mailbox servers.

4 Comments Permalink
1 2 Previous Next
Click to view Virtual_JTW's profile Member since: Nov 1, 2004

A VMware, EMC and HP customer's experiences.

View Virtual_JTW's profile

Communities