VMware Cloud Community
M_B_-_NS
Contributor
Contributor
Jump to solution

Slow disk IO : ESXi does not balance IO correctly between VM ?

Hello,

I experience disk I/O issues on a ESXi with 3 VM

: one VM will hog all available I/O, leaving all others VM's I/O

almost suspended until I stop the IOs on the first.

I understand

all VM should get a little slow, but what I see is an all-or-nothing

situation.

I enclose 2 graphs which explains the issue. perfs_disques_1.jpg is the bandwidth stacked-per-VM graph, where I explain what is what. The 2nd one is the IOPS (sort of) graph on the host-level.

Principle :

1-

I generate some IO on VM1

2- I initiate a reboot of VM 2 and 3

3-

if given "intense" enough IO, VM1 will keep VM2 and 3 from rebooting

until I stop the IO generation process.

The results are

reproductible without fail. As simply extracting an archive on VM1 is enough to

hog all VM 2 and 3's IOs, I'm not sure it's supposed to work like this...

Is it a bug or bad driver ? If

not, are there some advanced parameters (aside from disk shares, which

I'm aware of) I can tune to make ESXi cope better and balance the IO

between the VM like it should ?

NB : I understand not having cache

for the RAID controller doesn't help performance, but as I see it raw

performance isn't the issue there, the issue is bad load-balancing

before it even get to the disk.

Thanks for the help !

Software :

- ESXi 4

update 2 (generic version)

- firmware (BIOS and RAID controller) updated to latest version

Hardware :

- DL160 G6 1P (Xeon

E5520)

- 12 Go RAM

- Smart Array P410 without cache

- 2 SAS 15k HDD (RAID 1)

0 Kudos
1 Solution

Accepted Solutions
MattG
Expert
Expert
Jump to solution

ESX does not cache for disk writes. The BBWC will cache your writes, freeing up the VMs to move onto their next SCSI call.

Believe me, I have experienced an HP DL-380 without BBWC and the perf slowness is VERY noticeable. Installed BBWC and no more perf issues.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".

View solution in original post

0 Kudos
7 Replies
MattG
Expert
Expert
Jump to solution

Without the cache you are going to have bad performance with ESX. I would recommend getting the BBWC module and then retrying.

-MattG

If you find this information useful, please award points for "correct" or "helpful".

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
DSTAVERT
Immortal
Immortal
Jump to solution

Do you have any resource reservations on the VMs. What server hardware is the ESXi host running on? Disk controller and drive details?

-- David -- VMware Communities Moderator
0 Kudos
M_B_-_NS
Contributor
Contributor
Jump to solution

Without the cache you are going to have bad performance with ESX. I would recommend getting the BBWC module and then retrying.

-MattG

If you find this information useful, please award points for "correct" or "helpful".

Hello,

thank you for your answer.

I'm not sure I understand correctly how it works.

As I see it, it is not that the disk can't provide the IO, it is that the IO request does not get to the disk quickly enough, because ESXi priorize the other VM.

How would adding a controller cache help me ?

0 Kudos
M_B_-_NS
Contributor
Contributor
Jump to solution

Do you have any resource reservations on the VMs. What server hardware is the ESXi host running on? Disk controller and drive details?

No reservation, and you can find hardware detail at the end of the first post.

0 Kudos
RParker
Immortal
Immortal
Jump to solution

In addition to what others have said, ESX does NOT load balance I/O either. you only have 2 SAS drives, the first thing after getting a battery cache, is to add some more drives, 2 is hardly enough to sustain IO as you can see. So you need AT LEAST 6 more drives.

MattG
Expert
Expert
Jump to solution

ESX does not cache for disk writes. The BBWC will cache your writes, freeing up the VMs to move onto their next SCSI call.

Believe me, I have experienced an HP DL-380 without BBWC and the perf slowness is VERY noticeable. Installed BBWC and no more perf issues.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
M_B_-_NS
Contributor
Contributor
Jump to solution

OK, thank all of you for your help.

As for the suggestion to buy more drives ; it is not really needed : I used IOmeter to generate this IO but it does not represent the real workload, which will be much smaller.

0 Kudos