Hello,
I experience disk I/O issues on a ESXi with 3 VM
: one VM will hog all available I/O, leaving all others VM's I/O
almost suspended until I stop the IOs on the first.
I understand
all VM should get a little slow, but what I see is an all-or-nothing
situation.
I enclose 2 graphs which explains the issue. perfs_disques_1.jpg is the bandwidth stacked-per-VM graph, where I explain what is what. The 2nd one is the IOPS (sort of) graph on the host-level.
Principle :
1-
I generate some IO on VM1
2- I initiate a reboot of VM 2 and 3
3-
if given "intense" enough IO, VM1 will keep VM2 and 3 from rebooting
until I stop the IO generation process.
The results are
reproductible without fail. As simply extracting an archive on VM1 is enough to
hog all VM 2 and 3's IOs, I'm not sure it's supposed to work like this...
Is it a bug or bad driver ? If
not, are there some advanced parameters (aside from disk shares, which
I'm aware of) I can tune to make ESXi cope better and balance the IO
between the VM like it should ?
NB : I understand not having cache
for the RAID controller doesn't help performance, but as I see it raw
performance isn't the issue there, the issue is bad load-balancing
before it even get to the disk.
Thanks for the help !
Software :
- ESXi 4
update 2 (generic version)
- firmware (BIOS and RAID controller) updated to latest version
Hardware :
- DL160 G6 1P (Xeon
E5520)
- 12 Go RAM
- Smart Array P410 without cache
- 2 SAS 15k HDD (RAID 1)