VMware Cloud Community
JasonGillis
Enthusiast
Enthusiast

Limiting the number of simultaneous storage vMotions

Hi all,

I've got a fair sized vSphere 5.5 environment that's backing a vCloud Director environment supporting a test lab environment.  We've got a large datastore cluster that currently has 19 4TB datastores in it.  Because of a performance issue with vCloud Director, we've been instructed by VMware tech support to leave Storage DRS disabled on this cluster, otherwise we see timeouts when deploying vApps (we are WAY over-provisioned on these datastores).

Over time, we have datastores that get fuller than others, so our solution is to temporarily enable Storage DRS, ask it for recommendations, apply those recommendations then disable Storage DRS again.  At a high-level this process works well.

Where we're having trouble though is that vCenter is overly aggressive in the number of migrations it allows at once.  By default, it essentially allows ALL of its recommendations to occur at once, which just shatters the storage performance across most of the datastores, and it seems to have an effect on host resources too.  We see host CPU usage spiking to 100% during the migrations.  People complain, lineage is questioned, etc, etc.

I'd like to be able to limit the number of simultaneous migrations, and the only reference I've been able to find on the subject has been this old article from Frank Denneman:  Limiting the number of Storage vMotions - frankdenneman.nl  I've implemented a change via vCenter Server Settings to set config.vpxd.ResourceManager.maxCostPerEsx41Ds to 40, which if I read correctly should limit to two Storage vMotions per datastore plus a few extra slots for regular vMotions that might want to occur at the same time (referencing the datastore limits and resource costs documented for vSphere 5.5). 

This works to a degree.  We see that the number of migrations is limited, but it doesn't appear to be limiting properly.  I would expect that we'd see a maximum of two simultaneous migrations from each source datastore, but we see that vCenter appears to allow 38 simultaneous tasks, which would work out to the 2 per datastore.  It's as if the cost to the source datastore doesn't match the cost to the destination.  I'd agree that reads are probably easier than writes, but 30+ svMotions off a datastore is still hard.  For example, in testing last night, the SDRS recommendations planned migration of several VMs from 3 or 4 datastores that needed some relief.  What I got was an hour of alerts and notifications about performance issues because it didn't limit the way I expected it to work.

So, at the end of that long story, has anyone been able to effectively limit storage vMotions to not cripple their environment?  Or, might anyone have suggestions that might help?  I would like to be able to make use of the SDRS recommendations to help alleviate storage capacity bottlenecks, but if it's going to murder my environment for an hour each time I do that, it's not really a workable solution.

Thanks,

Jason

0 Kudos
2 Replies
WessexFan
Hot Shot
Hot Shot

Sounds like a catch-22, you are damned if you do and damned if you don't? This is concerning- "WAY over-provisioned on these datastores" The quickest way to a bad day is thin-provisioning disks in my opinion.. You can tweak sDRS a LOT, so finding that sweet spot takes a lot of practice.. I've turned off I/O Latency as a trigger and just used 5% space as my threshold. I've scheduled it for off hours and only in my production cluster. I wish I could be more help.. sounds like a right mess. Smiley Sad

VCP5-DCV, CCNA Data Center
0 Kudos
JasonGillis
Enthusiast
Enthusiast

Regarding the over-provisioning:  Our lab is used primarily for tech support problem reproduction and resolution so we don't have a lot of long lived vApps.  There's a lot of turn-over of vApps, so we rarely use much of the disk space allocated to the VMs within, but we do have large numbers of them.  It's definitely a risk, though.  We could get into some real trouble.  Smiley Happy

And, if vCloud Director better handled deployment of vApps in our environment with SDRS turned on, I'd stick with that and limit the times of day that action could be taken like you've done.

0 Kudos