Our environment has 43 host clusters and we employ the use of around 10 different types of templates. The current problem we are experiencing is that when we deploy a new VM from one of the templates to a cluster DIFFERENT from the cluster on which the template is located, the deployment takes MUCH longer than if we were deploying to the SAME cluster on which the template is located.
Now I don’t know which factor in the environment is causing the deployments to run slower to other clusters, but I would like to eliminate it.
So my question is, how should we alter our environment so that templates can be deployed to ALL clusters at that accelerated speed without having to maintain a group of templates on every cluster?
It’s probably worth noting that each host cluster gets dedicated datastores. So my initial thought was putting all of the templates on a single datatstore which every cluster has access to see. But I didn’t know if that would actually resolve our issue or not.
What is you storage type? iSCSI? NFS? FC?
We have two clusters and 32 hosts, the hosts are connected to storage via FC connection. There is no problem when we deploy virtual machine from templates on different cluster.
Are all the hosts in both the clusters connected to a single Storage or multiple Storage
If the host in cluster 1 are connected to a different storage where the template is residing and you are deploying the template in cluster 2 which is connected a different storage the elongated time to deploy virtual machines from template is expected.
Depending on the size of you template, this is likely to still take time as this is full copy. I would suggest making your templates as small as possible, and expands the vdisk as needed to meet application requirements. VAAI will help if your san supports it. How is your environment currently configured? and how long is it typically taking to deploy from template?
I checked again, we actually have 301 hosts, almost 8000 VMs, 48 clusters and 1200 datastores. And all of this is just one of a few different vCenter servers.
Deployment times are less than a minute when deploying to the same cluster and can go as long as 10 minutes when deploying to other clusters. I’m assuming that when we deploy to a host that can’t see the disk that the template is on, a migration of the template prior to the deployment is actually occurring, extending those deployment times. But I’m sure I could be wrong.
I’m not 100% certain on the connection type between the host frames and the storage frames, but I want to assume it is NFS since those storage frames are used for things other than VMs.
EDIT: I was told we use FC.
At the scale, template is probably not the best solution. I would look into automated VM deployment either using open source tools i.e. Chef or Puppet or SCCM for Windows.
I ran some tests to get deployment times…
When deploying from a template to the same cluster that the template is housed, deployment takes about 2.5 minutes. When deploying to a cluster that the template is not housed (different hosts and datastore) the deployment takes about 9.5 minutes. So the deployment time is over twice as long to other clusters.
I would still like to find out exactly which variable in the scenario causes the increased deployment times so that I can attempt to eliminate or circumvent it.
Almost 91.2% its a storage provisioning variable. If each of those clusters has dedicated storage, then your deployment has to hit the host HBA, then fibre switch, then the controller, maybe the cores, then back through the other controller, fibre switch, then to the other host HBA.. at least I think that's right.. but when you deploy locally, it has less to traverse . .. have your SAN guys carve out a TB or two to host those templates in each cluster. With an environment that big, I bet you have interns that can do the mule work of moving the templates around to the different clusters.. :smileylaugh:
Out of curiosity, do you have sDRS enabled on all your datastore clusters?
Maybe I'm way off base here...just a quick note
are there any storage configuration differences? for example, all the luns presented have the same block size and so on, differences in block size can also slow down svmotion.
We don't actually use Datastore Clusters. Each datastore is assigned directly to the host cluster.
And we want to avoid having to put clones of every template in each cluster. That sounds like a maintenance nightmare not to mention a disk devouring proposition.
As for the LUN configuration, I'd have to ask the storage guys. Since our environment is so big, the departments are highly compartmentalized so we have a team for almost everything.
I ran some another test to see if the theory I had would work.
We have a single datastore that has been made available to most clusters to facilitate live migrations.
I put one of our templates on that datastore and tested deploying to multiple clusters but the result was the same. No increased speed from a datastore that all the clusters can see, but the template's local cluster still saw faster deployment times. So it seems the bottleneck more in the realm of what WessexFan is thinking with the host.
I have to ask?
Whats the geographical disbursement and Network conditions. Pipes etc. With an environment that large, it sounds like you may be using VCHub\DCSpoke\field office pod scenario's. If you're deploying a template to a field office or such..10 Mins is good
The largest enviro that I worked was 6000 VMs\400 Hosts, Multi Regional with some offices at 128k Pipes doing local. Deploying a template to a regional office (AD, P, F was not done. We backed up a common sysprepped image directory to the office file server at the spoke, Trickle Replicated from a master build server at corporate.
Done on the cheap
We divided the enviro by the Missisippi and Mason-Dixon ie E-W-S. Datacenters had Storage Pods (EMC FC with some NetAPP FC). Templates were housed in eack physical datacenter on a common lun between clusters which was also used to move guests from one segmented enviro to the other...including secure DMZ. We kept that Transfer\iso limited to one lun because there was nother worse than a transient storage issue taking out 100+ hosts. Lesson Learned...sadly the bane of our existance was always a storage team in constant transition.
Eventually we found WDS with deployment nodes, advanced scripting with silent app installs at regional build servers was a good deployment strategy. We catered to 9 flavors of Win OS and 20 + Post Scripts.
Take it for what its worth
just in short:
to decrease your template deployment times you have to present for all of the clusters if possible shared SAN storage with VAAI support enabled.
With this approach you will offload all data operations to the array ... VAAI primitives (HardwareAcceleratedMove) do this work for you better than
When the VM is deployed (cloned) between clusters with non-shared datastores all the data are going through management network so the speed of data migrated is bound to network throughput dedicated to this network.
So I suggest that going for shared DS/LUN is the best way to speed up your deployment (sharing DS across clusters is OK)
The clone operation than looks that all data operations are executed inside the array without touching hyperviosr (kernel) layer.
At hypervisor level this is invoked by vmkernel datamover component exactly FS3DM-hardware offload, this is the most efficient and fastest method.
If you would clone the template between datastores on different storage (i.e. DAS vs FC SAN) all the data must traverse through higher (outside array) level which in that case is vmkernel and its software datamover FS3DM ... through your SAN fabric. (assuming that hosts can access both source/dest. DS)
For best results I would recommend:
- Using same Storage array vendor and both source and destination LUNs keeping within this box.
- Ideally carve out the LUNs across spindles with the same performance parameters.
- Beware of mixed VMFS block sizes across your DS only with the equal block sizes you will achieve best results with Storage vMotion
Apart from other issues that can occur when you will mix different VMFS block sizes in your clusters the main performance hit is that when
you migrate data between source and destination with different block size that the vmkernel will use legacy (FSDM) datamover which is the slowest method because you hit the higher stack.
So check carefully all of your datastores and its block sizes especially if you have upgraded your DS from VMFS3 to VMFS5 where the old block size is kept.
Also check if VAAI XCOPY primitive is enabled an oll of your hosts:
Under ESXi Configuration tan under Software, click Advanced Settings point to DataMover and find:
DataMover.HardwareAcceleratedMove should be
As other option if you cannot afford scenario above and want to preserve you actual placement is to leverage 10GbE networking in your datacenter ...:)
If you found this or any other answer helpful, please consider to award points. (use Helpful or Correct buttons)
Best Practices for Templates
Virtual machine templates are very powerful and versatile. The following best practices, culled
from many different areas of IT infrastructure management, will enable you to derive the most
value from templates and avoid starting ineffective habits.
Install Antivirus software and keep
it up to date: In today’s world
of viruses that are hyper efficient
at exploitation and replication, an OS installati
on routine has to merely initialize the network
subsystem to be vulnerable to attack. By deploy
ing virtual machines with up to date antivirus
protection, this exposure is limited. Keep
the antivirus software current every month by
converting the templates to VMs, powering
on, and updating the signature files.
Install the latest operating system patches, and st
ay current with the latest releases: Operating
system vulnerabilities and out of date antivirus
software can increase exposure to exploitation
significantly, and current antivirus software isn’t
enough to keep exposure to a minimum. When
updating a templates antivirus software, apply any relevant OS patches and hotfixes.
Use the template notes field to store update reco
rds: A good habit to get into is to keep
information about the maintenance of the template
in the template itself, and the Notes field is a
great place to keep informal update records.
Plan for ESX Server capacity for template managemen
t: The act of converting a template to virtual
machine, powering it on, accessing the network to
obtain updates, shutting down, and converting
back to template requires available ESX Server re
sources. Make sure there are ample resources for
this very important activity.
Use a quarantined network connection for updating templates: The whole point of keeping
antivirus and operating systems up to date is to av
oid exploitation, so leverage the ability of ESX
Server to segregate different kinds of network tr
affic and apply updates in a quarantined network.
Use the same datastore for storing templates and
for powered on templates: During the process
of converting templates to virtual
machines, do not deploy the template
to another datastore. It is
faster and more efficient to keep the template’s
files in the same place before and after the
Install the VMware Tools in the template: The
VMware Tools include optimized drivers for the
virtualized hardware components that use fewer ph
ysical host resources. Installing the VMware
Tools in the template saves time and reduces th
e chance that a sub optimally configured virtual
machine will be deployed to your production ESX Server infrastructure.
Use a standardized naming convention for templa
tes: Some inventory panel views do not offer
you the opportunity to sort by type, so create
a standard prefix for templates to help you
intuitively identify them by sorting by name.
Also, be sure to include enough descriptive
information in the template name to know
what is contained in the template.
Defragment the guest OS filesystem before conv
erting to template: Most operating system
installation programs create a highly fragmented
filesystem even before the system begins its
useful life. Defragment the OS and convert to
template, and that way you won’t have to worry
about it again until the system has been in production for a while.
Remove Nonpresent Hidden Devices from Template
s: This problem will like
ly occur only if you
convert existing physical images to templates.
Windows will store configuration information
VMware VirtualCenter Templates
ESX Server 3/VirtualCenter 2
Best Practices for Templates 10
about certain devices, notably network devices,
even after they are removed from the system.
Refer to Microsoft TechNet articl
e 269155 for removal instructions
Use Folders to Organize and Manage Templates: Folders can be both an organizational and
security container. Use them to
keep templates organized and secure.
Create Active Directory groups that map to Virtua
lCenter roles: Rather than assign VirtualCenter
roles to individual user accounts, create dedi
cated Active Directory groups, and place user
accounts in those groups.
Wouldn't presenting a Template Datastore to all hosts let you run into the maximum allowed hosts per volume ?Hosts per volume 64
As per the configuration maximum guide: Hosts per volume 64
The templates will have to be registered somewehere on a management cluster for instance.
Then you deploy the templates from your shared datastore to a datastore, probably, dedicated to the target cluster.
So the source host (where the template is registered) does not have direct access to the target datastore.
Only the receiving host has access to the source and target datastore.
In this case, I still do not think that VAAI will help you, and data still travels across your network interface.