<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Manual Automation</title>
    <link>http://communities.vmware.com/blogs/ManualAutomation</link>
    <description>A virtualization journey.</description>
    <pubDate>Wed, 08 Oct 2008 04:23:25 GMT</pubDate>
    <generator>Clearspace 1.10.12 (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2008-10-08T04:23:25Z</dc:date>
    <item>
      <title>Site Recovery Manager is a Hit!</title>
      <link>http://communities.vmware.com/blogs/ManualAutomation/2008/10/07/site-recovery-manager-is-a-hit</link>
      <description>&lt;b&gt;A Little History&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
In a previous article (&lt;a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/ManualAutomation/2008/05/15/the-big-plan-business-continuity"&gt;http://communities.vmware.com/blogs/ManualAutomation/2008/05/15/the-big-plan-business-continuity&lt;/a&gt;) I discussed why I was looking closely at SRM and what I needed to get done before I could implement the product. Now that I've successfully tested the product I'd like to give an update.&lt;br /&gt;
&lt;p /&gt;
&lt;b&gt;The Celerra Code Upgrade&lt;/b&gt;&lt;br /&gt;
&lt;p /&gt;
The code of both of my Celerras was upgraded to 5.6 in mid July. It wasn't pretty - no fun being in the data center until 3:00am. To EMC's credit, their CE hung in there with me, got the problems escalated and ultimately we got the VMware data stores working again. We were bit by the LUN resignaturing "bug". EMC knows the code upgrade causes this but for some reason we were surprised and found out the hard way at about 12:00am.&lt;br /&gt;
&lt;p /&gt;
It took another month to recover other services such as CIFS and iSCSI replication. When I was young, my father insisted was that when I handled someone else's property, I should always return it in the same or better condition than when I first received it. My main problem with EMC in this respect is that they left me with a system that didn't work like it did before they upgraded it. I'm past the CIFS and iSCSI replication problems now, but I'm still experiencing problems with CAVA that didn't exist before. Luckily, I don't think it's anything too difficult or serious and I will be calling EMC support to get this last problem resolved.&lt;br /&gt;
&lt;p /&gt;
While I've given feedback on this event to EMC support, note that I still am a fan of their unified storage product. It's not right for all companies or all situations but it is for my environment. Also, to be fair, many Celerra customers may never need to experience a code upgrade event. The only reason to do this is if you need some feature or improved capability that the upgrade provides. I've had an EMC CE tell me that they have retired EMC hardware that had the original code installed making it over three years old! This says volumes about the code's stability and reliability. In my case, I needed the expanded functionality of iSCSI LUN replication and compatibility with VMware Site Recovery Manager.&lt;br /&gt;
&lt;p /&gt;
&lt;b&gt;The Evaluation&lt;/b&gt;&lt;br /&gt;
&lt;p /&gt;
Anyone semi-familiar with installing VMware products will have no problem getting SRM installed. Note that you'll need to obtain the Storage Replication Adapter (SRA) from your storage OEM and install it in the proper sequence per the documentation. In my case I used documentation from EMC and VMware to install and configure the product. See the "Additional Resources" section at the end of this article.&lt;br /&gt;
&lt;p /&gt;
One of things that's awesome about VMware is the amount of attention they've given me regardless of whether I was working for a large $3 billion enterprise or a mid-sized $500 million dollar company. In this case, my sales rep offered to have a local VMware systems engineer (we'll call him "Dave") come out on-site and work with me to complete a proof-of-concept.&lt;br /&gt;
&lt;p /&gt;
I had SRM and the SRA components installed. I wanted a technical resource in case I needed it while performing that first test. Well, I needed it and got it. Keep in mind I hadn't purchased the product yet(!). Dave was able to help me work through a couple of issues we ran into during that first session such as file system sizing and licensing issues. It only took 2-3 hours but when finished, I had 4 VMs running in my remote data center 325 miles away! (Thanks to Dave and Ken!)&lt;br /&gt;
&lt;p /&gt;
Another tip I learned during this session: review the SRA log. In the case of the Celerra's SRA, it documents every command it executes and the results. It's a great way to learn what SRM is really doing behind the scenes with your storage in order to get the LUN(s) setup and ready to be used as a data store by ESX.&lt;br /&gt;
&lt;p /&gt;
&lt;b&gt;Subsequent Test Results&lt;/b&gt;&lt;br /&gt;
I have more testing to do but can report that I'm starting 4 VMs from a single replicated LUN in 8 minutes. And I'm not talking about from the time of just powering on, I'm talking about pressing the "big red (test) button" - powering-up the VMs - starting the Windows services - and the recovery plan completion. Try that using physical servers! Sorry, but even restoring servers from a B2D solution that's replicated to your DR site won't be as fast.&lt;br /&gt;
&lt;p /&gt;
I demonstrated SRM for the DR team and initially got a "that's all?" kind of reaction. I quickly realized that SRM, with the combination of array-based replication, +worked too well+! Meaning, it did such a good job of hiding the complexity and number of steps required to get from A to Z that my non-technical DR teammates didn't understand what SRM was really bringing to the table. If there's only one thing you take away from this article, make sure it's that you're better off explaining in simple terms the steps SRM is executing in the background before running a demonstration.&lt;br /&gt;
&lt;p /&gt;
Talking about the virtues of SRM is one thing (the recovery run book, the steps it automates, the testing capabilities (which are awesome by-the-way), etc.), demonstrating these product features for your DR team is another. If your experience is like mine, you'll find it dramatically influences the discussions on the project plan. In my case, we will be significantly changing the testing phases - actually streamlining those thanks to SRM.&lt;br /&gt;
&lt;p /&gt;
I wouldn't declare SRM to be a perfect specimen of engineering excellence; I reserve that title for Windows ME (yes that's a joke). But there are a couple of things that could be improved. I would like finer-grained control over when my VMs are powered on - I'd like to be able to specify dependencies between VMs. It seems like VMware is bent on specifying everything as "High", "Medium" and "Low". What if I want six groupings instead of just three? There are also a number of folks complaining about the lack of fail-back. Yes, there's no "big red button" to press to perform a fail-back but most storage OEMs including EMC are providing documentation describing how to get this done. Finally, I'd like VMware to consider non-array-based replication capabilities. I don't think you'll replicate 20 VMs this way, but it sure would be nice for those one or two one-offs for which you don't want to replicate an entire LUN. I can also image customers with smaller implementations or those with non-supported back-end storage using this feature.&lt;br /&gt;
&lt;p /&gt;
Because the POC exercise was a success it was easy to convince management to purchase the product. I think purchasing Site Recovery Manager is the best endorsement I can give it and VMware. Now I can't wait to see what the next version brings!&lt;br /&gt;
&lt;p /&gt;
&lt;b&gt;Additional Resources&lt;/b&gt;&lt;br /&gt;
&lt;p /&gt;
SRM Product Site: &lt;a class="jive-link-external" href="http://www.vmware.com/products/srm"&gt;http://www.vmware.com/products/srm&lt;/a&gt;&lt;br /&gt;
SRM Product Documentation: &lt;a class="jive-link-external" href="http://www.vmware.com/support/pubs/srm_pubs.html"&gt;http://www.vmware.com/support/pubs/srm_pubs.html&lt;/a&gt; (The Getting Started PDF is particularly useful and pay attention to the compatibility matrix.)&lt;br /&gt;
SRM VMTN Forum: &lt;a class="jive-link-community" href="http://communities.vmware.com/community/vmtn/mgmt/srm" title="VMware vCenter Site Recovery Manager"&gt;http://communities.vmware.com/community/vmtn/mgmt/srm&lt;/a&gt;&lt;br /&gt;
SRM Book: &lt;a class="jive-link-external" href="http://www.rtfm-ed.co.uk/?p=584"&gt;http://www.rtfm-ed.co.uk/?p=584&lt;/a&gt; (Mike's blog is also a good one to watch.) &lt;br /&gt;
Storage OEM Docs: The EMC documentation can be obtained by registering on their Powerlink (&lt;a class="jive-link-external" href="http://powerlink.emc.com/"&gt;http://powerlink.emc.com/&lt;/a&gt;) site and searching for "Site Recovery Manager". For other OEMs, contact your sales representative, search their web site or call support.</description>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">celerra</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">dr</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">emc</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">infrastructure</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">replication</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">srm</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vmware</category>
      <pubDate>Wed, 08 Oct 2008 04:25:00 GMT</pubDate>
      <author>Virtual_JTW</author>
      <guid>http://communities.vmware.com/blogs/ManualAutomation/2008/10/07/site-recovery-manager-is-a-hit</guid>
      <dc:date>2008-10-08T04:25:00Z</dc:date>
      <clearspace:dateToText>3 months, 2 days ago</clearspace:dateToText>
      <clearspace:replyCount>3</clearspace:replyCount>
      <wfw:comment>http://communities.vmware.com/blogs/ManualAutomation/comment/site-recovery-manager-is-a-hit</wfw:comment>
      <wfw:commentRss>http://communities.vmware.com/blogs/ManualAutomation/feeds/comments?blogPostID=2222</wfw:commentRss>
    </item>
    <item>
      <title>Break Like the Wind</title>
      <link>http://communities.vmware.com/blogs/ManualAutomation/2008/10/06/break-like-the-wind</link>
      <description>&lt;br /&gt;
References to Spinal Tap's great album aside, it's ironic: I'm working on VMware Site Recovery Manager product setup and configuration and the day I'm scheduled to fly out to Las Vegas for VMworld a mini-disaster strikes! It's Sunday, September 14^th^ around noon and all is normal. However, the remnants of hurricane Ike are heading this way. No big deal - a little rain, maybe a strong thunderstorm but nothing we haven't seen before. &lt;br /&gt;
&lt;p /&gt;
I'm packing for a week at VMworld and need to hit the road by 3:30PM. Around 2:00PM we start to hear the whirling sound of wind racing across the roof. At about 3:15PM I'm packing up the car and debris is getting blown down the street. Before I leave I have to remove a large piece of cardboard from the front of my car. I've never seen anything like this! &lt;br /&gt;
&lt;p /&gt;
Despite the high winds, I make it to the airport safely and notice planes are still taking off and landing. Listening to the radio on the way there I learned winds were reaching in excess of 80MPH and knocking down trees and power lines all across the state of Ohio. Dayton was impacted especially hard. I'm not sure how or why, but my plane took off successfully and it was a smooth ride once we were above the atmosphere. &lt;br /&gt;
&lt;p /&gt;
My house was without power for 4 days. Others had it worse with the outage lasting over 9 days. This kind of weather event hasn't happened in 200 years. Some very special, one-in-a-million chance conditions came together thanks to, in part, hurricane Ike to cause extraordinarily high winds in our region that none of us had seen before. &lt;br /&gt;
&lt;p /&gt;
So something that will never happen happened - a disaster occurred to our data center causing a multi-day loss in power. We have a natural-gas generator to cover a power outage. It kicked in and life is good right? Wrong! We also have redundant AC units but only one works with the generator and the automatic fail-over didn't work due to a bug in the system (which has since been corrected). The room starts heating up and servers start shutting off as the temperature reached 90 degrees Fahrenheit. We reached 95-96F before a co-worker showed up and manually switch the AC units over (I can't do it - I'm on a plane, remember?). It took him twice as long to get there because of downed power lines and trees that closed roads. &lt;br /&gt;
&lt;p /&gt;
He then starts powering up servers again. Luckily the outage for most systems is an hour or less on a Sunday when most of our users don't care or are being distracted by the tree that's landed in their living room. The ESX hosts and virtual machines all power-up successfully thanks in part to the hardware sensors on the servers that powered them off before the CPU, memory or I/O components fried in the heat. &lt;br /&gt;
&lt;p /&gt;
While the outage was bad, it brought to light several interesting points: &lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Test the equipment, but test the fail-over of the equipment. &lt;br clear="all" /&gt; 	Testing the actual fail-over is the hardest part of disaster recovery because it impacts production. However, regardless of whether it's AC units or virtual machines, this is the only way to be 100% certain you DR plan will work as designed and implemented.&lt;/li&gt;
&lt;li&gt;The quality of built-in server hardware sensors has increased dramatically in the last 7 years. &lt;br clear="all" /&gt; 	This is the third time I've had servers in a room that overheated due to an AC outage. The previous two events were lab servers that did not recover very well. The hardware didn't shutdown cleanly. Many systems were blue-screened if they were still running. When AC service was restored, some servers wouldn't power back up; others threw strange hardware-related errors months after the fact. Heat does bad things to electronics and I've seen too much of this first hand.&lt;/li&gt;
&lt;li&gt;Additional data center environmental monitoring and sensor devices are critically important. &lt;br clear="all" /&gt; 	I have the fortune of working for a data center manager that had the foresight to install a Sensaphone remote monitoring device (&lt;a class="jive-link-external" href="http://www.sensaphone.com/"&gt;http://www.sensaphone.com/&lt;/a&gt;). I'm sure there are other products on the market but this one works very well for us. It can call a list of numbers and speak the alert condition over the phone. The admin can then enter a code to stop it from calling the next number. It can monitor various conditions but in this case it called us to warn about the temperature. We also have an ADT monitoring unit but it doesn't seem to work as well.&lt;/li&gt;
&lt;li&gt;Data center protection is important in a disaster but also consider supporting non-data center work-related processes. &lt;br clear="all" /&gt; 	This "mini-disaster" put us without power for days, yet the business needed to continue to function. We needed to process sales orders, purchase raw materials, process payroll, etc. Have you ever worked for a company that couldn't meet payroll for any reason? To say that employees get upset is an understatement. So when no-one has power, where does the accounting staff go to get their job done? Plan to provide facilities for personnel to process these kinds of essential functions. After-all, what good is making sure the payroll system is running when nobody can access it anyway?&lt;/li&gt;
&lt;li&gt;Consider specific disaster scenarios and plan accordingly. &lt;br clear="all" /&gt; 	This maybe the hardest things to accomplish when planning for a disaster. Put two people in a room and they will have very different opinions on which scenario is more important than the other. The bottom line is you'll have choose some number, say the top three, and plan for those. You should plan for something - define it but don't let it stall the progress of the project.&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
The power outage lasted around 72 hours, the service outage lasted less than an hour. Not bad overall! Now I'd better get VMware Site Recovery Manager working - had that generator stopped running...</description>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">infrastructure</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">srm</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vi</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vi3</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">virtualization</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vmware</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">dr</category>
      <pubDate>Tue, 07 Oct 2008 01:09:00 GMT</pubDate>
      <author>Virtual_JTW</author>
      <guid>http://communities.vmware.com/blogs/ManualAutomation/2008/10/06/break-like-the-wind</guid>
      <dc:date>2008-10-07T01:09:00Z</dc:date>
      <clearspace:dateToText>3 months, 3 days ago</clearspace:dateToText>
      <wfw:comment>http://communities.vmware.com/blogs/ManualAutomation/comment/break-like-the-wind</wfw:comment>
      <wfw:commentRss>http://communities.vmware.com/blogs/ManualAutomation/feeds/comments?blogPostID=2221</wfw:commentRss>
    </item>
    <item>
      <title>The Big Plan: Business Continuity</title>
      <link>http://communities.vmware.com/blogs/ManualAutomation/2008/05/15/the-big-plan-business-continuity</link>
      <description>For my employer, this is the year of disaster recovery.  Almost all of our major projects tie-in to the goal of performing a successful DR test by the end of the year.  Besides the standard IT things that have to get done on a regular basis (asset management, corporate application TLC, etc.), this goal is really driving the work we&amp;rsquo;re doing.&lt;br /&gt;
&lt;p /&gt;
Sometime before I was hired, the company purchased two EMC Celerra NS352s for NAS/IP storage and two EMC Centeras for file and email archiving.  We&amp;rsquo;re an HP shop so were using DL380 G5s with the little USB key inside running ESX 3i or ESXi or whatever it&amp;rsquo;s called today.  We mostly use Cisco gear for networking and have dedicated switches for iSCSI and VMotion traffic.  We have two of everything &amp;ndash; our company&amp;rsquo;s IT services are split across two datacenters.  Each datacenter will be a hot recovery site for the other.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
So what does our DR solution entail?  Well, some fairly advanced technologies:&lt;br /&gt;
Virtualization: VMware VI3, release 3.5&lt;br /&gt;
DR Automation: VMware Site Recovery Manager&lt;br /&gt;
Replication: EMC Celerra Replicator V2&lt;br /&gt;
Snapshot consistency: EMC Replicator&lt;br /&gt;
&lt;br /&gt;
Oddly enough, looking at the list, only one of the technologies is shipping and in my possession today (VMware VI3).  Hmmm&amp;hellip; can you say, &amp;ldquo;Project risk&amp;rdquo;?&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
I first saw VMware Site Recovery Manager at a VMworld 2007 presentation.  If it works, it will be impressive.  Automating the steps to configure and power on VMs and a central place to store the DR &amp;ldquo;run book&amp;rdquo; will be sweet, to say the least.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
One of the advantages of having EMC storage equipment is that they own VMware (or at least the controlling interest).  This means there&amp;rsquo;s a pretty good chance that their storage platforms will be among the first certified to work with VMware products.  Sure enough, my Celerras will work upon release of SRM with a firmware/code upgrade.  The code is shipping on new product; however, EMC has a policy that delays certification by 90 days for installing/upgrading to current product.  That puts us in the June time-frame.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
The Celerra code upgrade provides a new version of Celerra Replicator that replicates iSCSI LUNs.  To ensure application consistency for applications such as Exchange and SQL Server, EMC Replicator must be used.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
This is the high-level plan: application-consistent snapshots, SAN/IP storage-based replication and SRM to run it all at the end.  Yes, we have some physical servers = HP-UX, AIX, etc. and too bad their story won&amp;rsquo;t be interesting.&lt;br /&gt;
&lt;p /&gt;
&lt;p /&gt;
So while I&amp;rsquo;m waiting for product to GA, I&amp;rsquo;m trying to get our VI3 platform up and stable.  Stay tuned for progress on that front.</description>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vi3</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">virtualization</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">vmware</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">emc</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">celerra</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">srm</category>
      <category domain="http://communities.vmware.com/blogs/ManualAutomation/tags">replication</category>
      <pubDate>Fri, 16 May 2008 03:04:00 GMT</pubDate>
      <author>Virtual_JTW</author>
      <guid>http://communities.vmware.com/blogs/ManualAutomation/2008/05/15/the-big-plan-business-continuity</guid>
      <dc:date>2008-05-16T03:04:00Z</dc:date>
      <clearspace:dateToText>7 months, 3 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>11</clearspace:replyCount>
      <wfw:comment>http://communities.vmware.com/blogs/ManualAutomation/comment/the-big-plan-business-continuity</wfw:comment>
      <wfw:commentRss>http://communities.vmware.com/blogs/ManualAutomation/feeds/comments?blogPostID=1753</wfw:commentRss>
    </item>
  </channel>
</rss>

