Re: Storage QoS (Limit IOPS)

knuter · ‎06-30-2017

Hello all,

Are any of you setting a limit on IOPS for a VM/VMDKs?
We are using it, but we are having an issue, or several issues, and getting pretty much nowhere with support. Tested it on ESXi 6.5 and several different builds of 6.0.
Below is the test results I sent to support a couple of months ago. Could any of you replicate the tests in your environment and tell me the results? Or if you have any other suggestions.

I have tested the Storage IO Filter driver in ESXi 6.5 and it works for IOPS, but I need to set it for bandwidth(KB/s) and the IO Filter driver is not able to do that. We want to set the limit on VMDKs to be able to apply bandwidth limits, but I'm running the support case using IOPS because it suffers the same issues and is a bit easier to understand since the bandwidth limit is not visible through the GUI.

This test has been done using a VM having 3 VMDKs, all on same datastore, each VMDK has an IOPS limit of 100.
Having all VMDKs on the same datastore should pool the number of IOPS according to https://kb.vmware.com/kb/1038241
VM is HW11, running on ESXi 6.5 build 4564106. (Use HW11 because HW13 does something weird)
Setting limit to 100 IOPS for each VMDK:

Iometer is being used to generate load with 32 Outstanding I/Os, access specifications are set to 512B, 100% read, 0% random.
Performance graph from vSphere client is being used instead of esxtop so we can see the IO for each individual VMDK.

Issue 1: When CBT is disabled, limits are set, snapshot does not affect results
Each VMDK is limited to 100, the IOPS are not getting pooled.

Load on two last disks. VM should have gotten 300 in total, bug is visible.

Load on all three disks. All disks are in use therefore it is reaching its limit of 300 and hiding the bug

Issue 2: When CBT is enabled, limits are set, no snapshot has been taken after setting limits
Each VMDK in use is getting 300 IOPS. 300 IOPS should have been the limit for the VM in total when pooled, not each individual VMDK.

Load on two last disks. VM should have gotten 300 in total, bug is visible.

Load on all three disks. All disks are getting 300 IOPS each, bug is visible.

Issue 3: When CBT is enabled, limits are set, snapshot has been taken after setting limits
Last VMDK is having its individual limit enforced as if not being a part of the pool. The other two VMDKs are getting the total of the pool each.

Load on two last disks. 400 IOPS in total is higher than the limit, but skewed between second and last disk.

Load on all three disks. The first and second VMDK are each getting about 300 IOPS. Total IOPS is now higher than it should, but skewed for the last disk.

Load on last VMDK. 100 IOPS is well below the pool limit of 300 for this VM.

AishR · ‎07-13-2017

Per datastore calculation in 6.0 is removed and Limit IOPS is a per VMDK setting only now. if you specify a Limit IOPS on a VMDK we will strictly adhere to this now and other VMDKs in the VM no longer contribute to this calculation. The existing information in this KB-1038241 only applies to ESXi 5.5 and lower.

knuter · ‎07-24-2017

Thanks, but vSphere 6.0 does not matter. What matters is which diskscheduler you use. With mclock you are correct, it is applied pr VMDK and not aggregated for VMDKs on the same datastore. If using default(which is not default anymore) then you see the behavior in my original post.

I'm not a fan of this change. In our environment we need to limit the VM as a whole, not the individual VMDKs.

knuter · ‎10-23-2017

Update, VMware have now silently "fixed" this bug in ESXi 6.5 Update 1. Now it is consistent in always enforcing the limits for each VMDK and not pooling the resources... which makes it useless for us, but at least consistent.