It’s probably worth noting that each host cluster gets dedicated datastores. So my initial thought was putting all of the templates on a single datatstore which every cluster has access to see. But I didn’t know if that would actually resolve our issue or not.
What is you storage type? iSCSI? NFS? FC?
We have two clusters and 32 hosts, the hosts are connected to storage via FC connection. There is no problem when we deploy virtual machine from templates on different cluster.
Are all the hosts in both the clusters connected to a single Storage or multiple Storage
If the host in cluster 1 are connected to a different storage where the template is residing and you are deploying the template in cluster 2 which is connected a different storage the elongated time to deploy virtual machines from template is expected.
Depending on the size of you template, this is likely to still take time as this is full copy. I would suggest making your templates as small as possible, and expands the vdisk as needed to meet application requirements. VAAI will help if your san supports it. How is your environment currently configured? and how long is it typically taking to deploy from template?
I checked again, we actually have 301 hosts, almost 8000 VMs, 48 clusters and 1200 datastores. And all of this is just one of a few different vCenter servers.
Deployment times are less than a minute when deploying to the same cluster and can go as long as 10 minutes when deploying to other clusters. I’m assuming that when we deploy to a host that can’t see the disk that the template is on, a migration of the template prior to the deployment is actually occurring, extending those deployment times. But I’m sure I could be wrong.
I’m not 100% certain on the connection type between the host frames and the storage frames, but I want to assume it is NFS since those storage frames are used for things other than VMs.
EDIT: I was told we use FC.
And PJ, yes, each host cluster is assigned dedicated storage for that cluster only.
At the scale, template is probably not the best solution. I would look into automated VM deployment either using open source tools i.e. Chef or Puppet or SCCM for Windows.
We use both PVS and powercli scripts for deployments, each of which uses templates and cust specs. Works great.
I ran some tests to get deployment times…
When deploying from a template to the same cluster that the template is housed, deployment takes about 2.5 minutes. When deploying to a cluster that the template is not housed (different hosts and datastore) the deployment takes about 9.5 minutes. So the deployment time is over twice as long to other clusters.
I would still like to find out exactly which variable in the scenario causes the increased deployment times so that I can attempt to eliminate or circumvent it.
Almost 91.2% its a storage provisioning variable. If each of those clusters has dedicated storage, then your deployment has to hit the host HBA, then fibre switch, then the controller, maybe the cores, then back through the other controller, fibre switch, then to the other host HBA.. at least I think that's right.. but when you deploy locally, it has less to traverse . .. have your SAN guys carve out a TB or two to host those templates in each cluster. With an environment that big, I bet you have interns that can do the mule work of moving the templates around to the different clusters..
Out of curiosity, do you have sDRS enabled on all your datastore clusters?
Maybe I'm way off base here...just a quick note
are there any storage configuration differences? for example, all the luns presented have the same block size and so on, differences in block size can also slow down svmotion.
We don't actually use Datastore Clusters. Each datastore is assigned directly to the host cluster.
And we want to avoid having to put clones of every template in each cluster. That sounds like a maintenance nightmare not to mention a disk devouring proposition.
As for the LUN configuration, I'd have to ask the storage guys. Since our environment is so big, the departments are highly compartmentalized so we have a team for almost everything.
I ran some another test to see if the theory I had would work.
We have a single datastore that has been made available to most clusters to facilitate live migrations.
I put one of our templates on that datastore and tested deploying to multiple clusters but the result was the same. No increased speed from a datastore that all the clusters can see, but the template's local cluster still saw faster deployment times. So it seems the bottleneck more in the realm of what WessexFan is thinking with the host.
I have to ask?
Whats the geographical disbursement and Network conditions. Pipes etc. With an environment that large, it sounds like you may be using VCHub\DCSpoke\field office pod scenario's. If you're deploying a template to a field office or such..10 Mins is good
The largest enviro that I worked was 6000 VMs\400 Hosts, Multi Regional with some offices at 128k Pipes doing local. Deploying a template to a regional office (AD, P, F was not done. We backed up a common sysprepped image directory to the office file server at the spoke, Trickle Replicated from a master build server at corporate.
Done on the cheap
We divided the enviro by the Missisippi and Mason-Dixon ie E-W-S. Datacenters had Storage Pods (EMC FC with some NetAPP FC). Templates were housed in eack physical datacenter on a common lun between clusters which was also used to move guests from one segmented enviro to the other...including secure DMZ. We kept that Transfer\iso limited to one lun because there was nother worse than a transient storage issue taking out 100+ hosts. Lesson Learned...sadly the bane of our existance was always a storage team in constant transition.
Eventually we found WDS with deployment nodes, advanced scripting with silent app installs at regional build servers was a good deployment strategy. We catered to 9 flavors of Win OS and 20 + Post Scripts.
Take it for what its worth
just in short:
to decrease your template deployment times you have to present for all of the clusters if possible shared SAN storage with VAAI support enabled.
With this approach you will offload all data operations to the array ... VAAI primitives (HardwareAcceleratedMove) do this work for you better than
When the VM is deployed (cloned) between clusters with non-shared datastores all the data are going through management network so the speed of data migrated is bound to network throughput dedicated to this network.
So I suggest that going for shared DS/LUN is the best way to speed up your deployment (sharing DS across clusters is OK)
The clone operation than looks that all data operations are executed inside the array without touching hyperviosr (kernel) layer.
At hypervisor level this is invoked by vmkernel datamover component exactly FS3DM-hardware offload, this is the most efficient and fastest method.
If you would clone the template between datastores on different storage (i.e. DAS vs FC SAN) all the data must traverse through higher (outside array) level which in that case is vmkernel and its software datamover FS3DM ... through your SAN fabric. (assuming that hosts can access both source/dest. DS)
For best results I would recommend:
- Using same Storage array vendor and both source and destination LUNs keeping within this box.
- Ideally carve out the LUNs across spindles with the same performance parameters.
- Beware of mixed VMFS block sizes across your DS only with the equal block sizes you will achieve best results with Storage vMotion
Apart from other issues that can occur when you will mix different VMFS block sizes in your clusters the main performance hit is that when
you migrate data between source and destination with different block size that the vmkernel will use legacy (FSDM) datamover which is the slowest method because you hit the higher stack.
So check carefully all of your datastores and its block sizes especially if you have upgraded your DS from VMFS3 to VMFS5 where the old block size is kept.
Also check if VAAI XCOPY primitive is enabled an oll of your hosts:
Under ESXi Configuration tan under Software, click Advanced Settings point to DataMover and find:
As other option if you cannot afford scenario above and want to preserve you actual placement is to leverage 10GbE networking in your datacenter ...:)
If you found this or any other answer helpful, please consider to award points. (use Helpful or Correct buttons)
If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons)