Proven Practice: ROBO - Managing Remote ESX Hosts Over WAN with VirtualCenter

Version 1

    Introduction

    Many VMware customers have virtualized their ROBO (Remote Office Branch Office) offices in order to reap the benefits provided by VMware Infrastructure such as hardware cost savings, business continuity and high availability, and lower maintenance costs. Because of ROI considerations and the desire to keep management of ESX hosts and virtual machines centralized, many of these customers choose to keep one VirtualCenter server and configure it to manage ESX hosts over the WAN.

     

    This practice has been used by large enterprises who have ESX Servers distributed over large geographical distances.

     

    Intended Audience

     

    VCPs and IT Architects planning to manage distributed VMware Infrastructure where VirtualCenter is accessed over a wide area network in a remote location.

     

    Outline

     

    1.1 Challenge
    1.1.1 WAN Performance of VC commands
    1.1.2 Recommendations
    2 Customer Scenarios
    2.1 ABC International
    2.1.1 VirtualCenter Server Configuration
    2.1.2 Workloads
    2.1.3 Proven Practices
    2.1.3.1 Administrative Practice
    2.1.3.2 Remote Console Access Practice
    2.1.3.3 Storage Practice
    2.1.3.4 Remote Patching Practice
    2.1.3.5 Remote Provisioning Practice
    2.1.3.6 Remote Monitoring Practice
    2.1.3.7 Business Continuity Practice
    2.1.4 Summary

     

    Proven Practice: ROBO - Managing Remote ESX Hosts Over WAN with VirtualCenter

     

     

    Many VMware customers have virtualized their ROBO (Remote Office Branch Office) offices in order to reap the benefits provided by VMware Infrastructure such as hardware cost savings, business continuity and high availability, and lower maintenance costs. Because of ROI considerations and the desire to keep management of ESX hosts and virtual machines centralized, many of these customers choose to keep one VirtualCenter server and configure it to manage ESX hosts over the WAN.

     

    Note: we use ‘site' and ‘datacenter' interchangeably in this document.

     

    1.1 Challenge

     

    WAN is characterized as low-speed and high latency. Using a VirtualCenter (VC) server to manage ESX hosts over the WAN poses some challenges in sustaining practical response times. Some VC commands are carried out primarily on the host side and some may require moderate to high interactions between ESX hosts and the VC Server. The latter are the ones that will produce visible impact to response times for VMware Infrastructure (VI) administrators.

     

    VMware has gathered some WAN performance data of VC commands and we will present them in the next section.

     

    1.1.1 WAN Performance of VC commands

     

    In the performance benchmarks, we examined the VC commands below:

    • Add ESX host

    • Remove ESX host

    • Browse datastore

    • Add virtual machine

    • Clone virtual machine

    • Power on virtual machine

    • Activate virtual machine console

    • Power off virtual machine

    • Remove virtual machine

    • Power on multiple virtual machines

    • Power off multiple virtual machines

    • Change focus to ESX host

     

    The performance of VC commands listed above is examined with the following network configurations:

     

    Bandwidth

    Delay (RTT)

    Error Rate

    Other Traffic

    56 kbps

    200-300 ms

    0.05%, 0.1%

    Up to 60%

    128 kbps

    200-300 ms

    0.05%, 0.1%

    Up to 60%

    192 kbps

    500-650 ms

    0.05%

    Up to 50%

    256 kbps

    150-250 ms

    0.05%

    Up to 50%

     

    From our performance studies, we have divided the VC commands according to the extent that low speed, high latency links impact them. See table below:

     

    Low to moderate impact

    High Impact

    Remove ESX host

    Add ESX host

    Clone virtual machine

    Activate virtual machine console

    Power on virtual machine

    Change focus to an ESX host

    Power off virtual machine

    Browse datastore

    Add virtual machine

    Remove virtual machine

    Power on multiple virtual machines

    Power off multiple virtual machines

     

    Low impact: minor added delay and a small cosmetic effect on progress bar.

    Moderate Impact: added delay, but usability only slightly impacted.

    High Impact: showing problems such as command time-out, VC freeze for a few seconds, and highly impacted usability.

     

    VC commands that demonstrate low to moderate impact on WAN links can be used more liberally, despite the network properties. Yet VC commands that show high impact have requirements on network properties for them to sustain quality in usability and response times. The following table examines each of the High Impact VC commands in detail and suggests known workarounds.

     

    High Impact VC commands

    Networking that cause high impact

    Symptoms

    Known Workarounds

    Add ESX host

    <= 128 kbps

    • Download of vpxa package from VC Server to ESX host takes more than 30 minutes.

    • SOAP session times out after 30 minutes

    • Manually upgrade vCenter Agent (vpxa) on the VMware ESX host before adding ESX to VC

      • The vCenter Agent (vpxa) upgrade scripts are located in the upgrade folder of vpxd. Information about which bundle corresponds to which version of VMware ESX is in the file bundleversion.xml in the same folder. (e.g: vpx-upgrade-esx-7-linux-<buildNumber> is applicable for VMware ESX 3.5 and beyond. vpx-upgrade-eesx-1-linux-<buildNumber> is for VMware ESXi 3.5 and beyond.)

      • This upgrade script needs to be copied to the host and run as: sh vpx-upgrade-esx-7-linux-<buildNumber>

      • This script is a self extracting shell script which will uninstall the previous version of vpxa (if needed) and install the new one.

    • Set heartbeat timeout to a higher value to reduce host disconnects:

      • Edit 'C:/Documents and Settings/All Users/Application Data/VMware/VMware VirtualCenter/vpxd.cfg'

      • Add the following in the <vpxd> tags:

     <heartbeat>
    <notRespondingTimeout>60</notRespondingTimeout>
    </heartbeat>
    
      • Restart VirtualCenter server service. Timeout value within notRespondingTimeout tag is in seconds.

    • Increase the SOAP timeout in all affected ESX hosts by editing '/etc/vmware/hostd/config.xml' and adding:

    <!-- Set SOAP session timeout to 4 hours (default 30 min) -->
    <soap>
    <sessionTimeout>240</sessionTimeout>
    </soap>
    

     

    Activate Virtual Machine (VM) Console

    <= 256 kbps, delay> 100 ms

    • Difficult to keep track of mouse pointer

    • Difficult to highlight text

    • Difficult to see what is typed

    • Reduce the size and color depth (number of colors) on all affected VM consoles to smaller values, e.g. 800x600/16-bit.

    • Use RDP to connect to Windows VM

    • Use ssh or putty to connect to Unix VM

    Change focus to an ESX host

    <= 128kbps, delay < 150 ms

    • VC Client freezes for several seconds

    • The cause of this problem may be related to inline license checks. Might be able to reduce the freezing time by relocating the licenses.

     

    1.1.2 Recommendations

    From the benchmark results presented above, we have come to the following recommendations:

    • Do not use links <= 56 kbps. 128 kbps is generally useable

    • Manually upgrade vCenter Agent (vpxa) on the VMware ESX host before adding ESX to VC

    • Set heartbeat timeout to a higher value to reduce host disconnects

    • Reduce the size and color depth (number of colors) on all affected VM consoles to smaller values, e.g. 800x600/16-bit.

    • Use RDP to connect to Windows VM

    • Use ssh or putty to connect to Unix VM Use

     

    2 Customer Scenarios

     

    VMware customers have deployed VMware Infrastructure in distributed environments and sustained practical VirtualCenter Server management of ESX hosts over the WAN. In this section, we will present a customer scenario for your reference. Note that we will constantly add more customer scenarios in this document and also more details on existing customer scenarios. The updates can be expected on a monthly basis.

    If you are interested in providing some data to help us understand your ROBO practices, you can fill out a survey at

    http://www.surveymethods.com/EndUser.aspx?AF8BE7FFABE8FCF8.

     

    2.1 ABC International

     

    ABC International is a company with branches that are distributed around the world. Its IT environment has the following characteristics:

    1. Datacenters are distributed in 4 main geographical areas (U.S., Europe, Asia, and Middle East).

    2. Each geographical area is linked to the site hosting VirtualCenter server via network links of similar characteristics. See ABC International Network Architecture diagram below. There are 4 types of network links:

    • OC12 (622 Mbps)

    • T3 (43.3 Mbps)

    • T1 (1.5 Mbps), average latency ~ 169 ms

    • Satellite (1-6Mbps), latency between 650 - 700 ms

    3. Minimum or no IT coverage on remote sites.

    4. Maximum number of ESX hosts per site is 32; minimum number of ESX hosts per site is 2. The majority of remote sites have only 2 ESX hosts each.

    5. The growth rate of ESX hosts is 5% per year.

    6. The growth rate of virtual machines is 5 to 10% per year.

    ABC International.jpg

     

    2.1.1 VirtualCenter Server Configuration

     

    ABC International has installed one VirtualCenter Server in its California datacenter. (Refer to ABC International Network Architecture diagram.)

     

    The VC Server coexists on the same hardware as the VC database server. The hardware specification is as below:

    • DL 360 G5

    • 2 way dual core

    • 6 GB memory

    The software specification is as below:

    • Operating system: Windows Server 2003

    • VC Server 2.5

    • Database server: SQL 2005 with full recovery. The database data is nightly dumped to a CIFS share that is replicated to the datacenter Massachusetts.

     

    2.1.2 Workloads

     

    ABC International runs the following workloads in their remote sites:

    • Domain Controllers

    • File/Print Servers

    • Others

     

    2.1.3 Proven Practices

    In this section, we will present the VC management practices that have been proven at ABC International. These practices are divided into several categories: Administrative Practice, Remote Console Access Practice, Storage Practice, Remote Patching Practice, Remote Provisioning Practices, Remote Monitoring Practice and Business Continuity Practice.

     

    2.1.3.1 Administrative Practice

     

    Small number of administrators. ABC International has 15 IT administrators. Out of the 15 VI administrators, they have 4 core VI administrators who are all based in the United States (California and Massachusetts). Others are Application administrators who log on when monitoring application performance.

     

    Privileges. ABC International sets the VC permissions at datacenter object or host object level. Other customers should consider setting permissions also at the virtual machine level.

     

    2.1.3.2 Remote Console Access Practice

     

    Use RDP. VI administrators at ABC International use RDP clients to access RDP capable virtual machines hosted at remote sites. Alternatively consider providing an RDP destination machine that has a VI Client installed which is on a local LAN to the virtual machines you're trying to reach. This way you can use the VI Client and the virtual machine Console to access RDP and non-RDP accessible virtual machines effectively.

     

    2.1.3.3 Storage Practice

     

    Image storage replication. ABC International stores virtual machine templates, installation (iso) files, host images and virtual machine images in a NFS share on their storage appliance that has the capability to do remote incremental replication (only the deltas are replicated). This NFS share is located in California and is replicated to all other sites. This practice is useful for remote provisioning, for example.

     

    2.1.3.4 Remote Patching Practice

     

    Stage locally, then replicate. When patches are available for ESX Server or virtual machines (e.g. guest OS updates, application software updates), the patches are applied to the host image(s) or the virtual machine image(s) located in California. These patches are tested before putting in the NFS share mentioned in section 2.1.3.3 for replication to other sites. Since the replication frequency is every 24 hours, the image lag time on any remote site will at most be 24 hours.

    Using the replicated images, the VI administrators then ‘refresh' the ESX host(s) or the virtual machine(s).

     

    2.1.3.5 Remote Provisioning Practice

     

    Stage locally, then replicate. ABC International has a remote provisioning practice that is similar to their Remote Patching Practice. The images are created locally in California and replicated via storage replication to remote sites for provisioning. See section 2.1.3.3 and 2.1.3.4 for more details.

     

    2.1.3.6 Remote Monitoring Practice

     

    Leverage VC for performance and uptime monitoring. ABC International leverages VirtualCenter for the performance data associated with ESX host(s) and virtual machine(s). The graph rendering speed could be slow, especially when the ESX hosts are connected via the 1-6 Mbps satellite link; yet the overall experience was still practical for ABC International.

    Besides VirtualCenter, ABC International also leverages vCharterPro by Vizioncore for virtual machine performance monitoring.

     

    2.1.3.7 Business Continuity Practice

     

    Leverage VMware DRS and HA. ABC International leverages VMware Infrastructure services such as DRS and HA to improve the availability of their virtual machines (and workloads). The availability of 1GE network links allows ABC International to leverage full DRS and HA automation at every site.

     

    2.1.4 Summary

     

    ABC International has an IT environment that is distributed geographically over the world, including the Unites States, Europe, Asia and Middle East. The company is a relatively large VMware shop, with more than 100 ESX hosts and around 900 virtual machines deployed in its datacenters. When it comes to managing these ESX hosts and virtual machines by a single VirtualCenter Server, the IT administrators have a challenge with the relatively slower links (slowest link as 1 Mbps with 650 ms latency) to the remote sites.

    This challenge is met by the IT administrators through the following proven practices:

    • Administrative Practice:

      • Small administrator number: 4 core VI administrators and 11 application administrators

    • Remote Console Access Practice:

      • Use RDP instead of virtual machine console in VMware Infrastructure Client.

    • Storage Practice

      • Image storage replication: replicate the NFS share, which is used to store virtual machine templates, installation (iso) files, host images and virtual machine images, to all remote sites.

      • Nightly dump the VirtualCenter database to a CIFS share that is replicated from California to Massachusetts.

    • Remote Patching Practice

      • Stage patched virtual machines or ESX hosts locally and have the images replicated to remote sites for refreshing.

    • Remote Provisioning Practice

      • Stage patched virtual machines or ESX hosts locally and have the images replicated to remote sites for provisioning.

    • Business Continuity Practice

      • Leverage DRS and HA clusters within each datacenter.

     

    Resources

     

    Author

    Desmond Chan, dchan@vmware.com

    vmware.gif

     

    Disclaimer

    You use this proven practice at your discretion. VMware and the author do not guarantee any results from the use of this proven practice. This proven practice is provided on an as-is basis and is for demonstration purposes only.