Hello,
I experience disk I/O issues on a ESXi with 3 VM
: one VM will hog all available I/O, leaving all others VM's I/O
almost suspended until I stop the IOs on the first.
I understand
all VM should get a little slow, but what I see is an all-or-nothing
situation.
I enclose 2 graphs which explains the issue. perfs_disques_1.jpg is the bandwidth stacked-per-VM graph, where I explain what is what. The 2nd one is the IOPS (sort of) graph on the host-level.
Principle :
1-
I generate some IO on VM1
2- I initiate a reboot of VM 2 and 3
3-
if given "intense" enough IO, VM1 will keep VM2 and 3 from rebooting
until I stop the IO generation process.
The results are
reproductible without fail. As simply extracting an archive on VM1 is enough to
hog all VM 2 and 3's IOs, I'm not sure it's supposed to work like this...
Is it a bug or bad driver ? If
not, are there some advanced parameters (aside from disk shares, which
I'm aware of) I can tune to make ESXi cope better and balance the IO
between the VM like it should ?
NB : I understand not having cache
for the RAID controller doesn't help performance, but as I see it raw
performance isn't the issue there, the issue is bad load-balancing
before it even get to the disk.
Thanks for the help !
Software :
- ESXi 4
update 2 (generic version)
- firmware (BIOS and RAID controller) updated to latest version
Hardware :
- DL160 G6 1P (Xeon
E5520)
- 12 Go RAM
- Smart Array P410 without cache
- 2 SAS 15k HDD (RAID 1)
ESX does not cache for disk writes. The BBWC will cache your writes, freeing up the VMs to move onto their next SCSI call.
Believe me, I have experienced an HP DL-380 without BBWC and the perf slowness is VERY noticeable. Installed BBWC and no more perf issues.
-MattG
Without the cache you are going to have bad performance with ESX. I would recommend getting the BBWC module and then retrying.
-MattG
If you find this information useful, please award points for "correct" or "helpful".
Do you have any resource reservations on the VMs. What server hardware is the ESXi host running on? Disk controller and drive details?
Without the cache you are going to have bad performance with ESX. I would recommend getting the BBWC module and then retrying.
-MattG
If you find this information useful, please award points for "correct" or "helpful".
Hello,
thank you for your answer.
I'm not sure I understand correctly how it works.
As I see it, it is not that the disk can't provide the IO, it is that the IO request does not get to the disk quickly enough, because ESXi priorize the other VM.
How would adding a controller cache help me ?
Do you have any resource reservations on the VMs. What server hardware is the ESXi host running on? Disk controller and drive details?
No reservation, and you can find hardware detail at the end of the first post.
In addition to what others have said, ESX does NOT load balance I/O either. you only have 2 SAS drives, the first thing after getting a battery cache, is to add some more drives, 2 is hardly enough to sustain IO as you can see. So you need AT LEAST 6 more drives.
ESX does not cache for disk writes. The BBWC will cache your writes, freeing up the VMs to move onto their next SCSI call.
Believe me, I have experienced an HP DL-380 without BBWC and the perf slowness is VERY noticeable. Installed BBWC and no more perf issues.
-MattG
OK, thank all of you for your help.
As for the suggestion to buy more drives ; it is not really needed : I used IOmeter to generate this IO but it does not represent the real workload, which will be much smaller.