I'm currently testing an IOPS limitaion feature in a storage array and I noticed that when I define an ESX host a certain limit, the IOPS won't reach that limit.
For example if an ESX host sent 33,000 IOPS to the array, and then I defined the limit to a 25,000 IOPS, it will only send 20,000 IOPS. I also noticed that if I change the NMP from RR to fixed the IOPS will go up a bit (around 22,500 IOPS), but not more than that.
Via esxtop I saw that the average latency is not that high (4ms total, between 0.5-1ms for the kernel queue and around 3-3.5ms for the device queue).
This thing did not happen with other OSes, which reached the limit or extremely close to it. Is it possible that when the ESX notices that it's being limited it somehow lowers it's throughput?