VMware Cloud Community
wakesec
Contributor
Contributor

Disk I/O Really Slow?

Is anyone else seeing an issue where all of the VMs on ESX 3.0 are experiencing really slow disk IO?

I have a setup like this:

I have a five separate one TB LUNs on my SAN being used by five ESX 3.0.2 Servers connected by a FC switch. Each ESX server (and LUN) has 20 VMs on it running Windows.

I have tried turning off all but one VMs and I still get slow disk IO. I have also tried making a small (400GB) LUN that stores only one VM as well, and there is no improvement.

Any ideas are appreciated.

0 Kudos
10 Replies
Jimmy_Wong
Contributor
Contributor

Hi,

1st of all, make sure all hardware are compatible with ESX server, including SAN storage; it will be good the products are listed in the HCL of ESX.

See whether you got any performance issue from other servers that connected to this storage and SAN switch, it could be the SAN performance issue.

If you got the vendor support, you can escalate to your vendor; especially if they have the VCP with them; who can help you on-site support and escalate the issue to VMware if needed. Smiley Happy

Jimmy

0 Kudos
gorto
Enthusiast
Enthusiast

All of the VMs across 5 ESX hosts?

Sounds global SAN related to me also:

- check HCL, load balance across HBA fibre paths (if available), check fibre ports for errors, check ESX logs for errors.

Lastly, I heard from a (real) VMWare techie that 350G was the optimal size of a LUN - any larger an performance drops off - anyone else out ther know of an optimal size for vmfs?

0 Kudos
Rumple
Virtuoso
Virtuoso

What brand of SAN do you have?

Is Write cache enabled on the SAN controller?

Are you using SATA or FC disk

Could also have to do with the SAN LUN design. Are your RAID groups = Size of LUN or do you have a larger raid group with multiple LUNS in them and these are all the VMFS LUNS

How many disks/Raid Group

0 Kudos
wakesec
Contributor
Contributor

My SAN is an IBM DS4700, and all of the config data I can find from the Storage Manager has my drives at 4GB/s. Write caching IS enabled. There don't seem to be any errors in the ESX logs.

My SAN disks are using RAID5 over 16 disks. I have them set up intp 5 1TB LUNs and one more LUN with the left over space.

The 4700 is connected to the esx servers through a Brocade Fiber Switch, which is also set to 4GB/s.

0 Kudos
RParker
Immortal
Immortal

We have the same problem at times. The deal is HOW you setup your LUNs. If you setup 14 or 28 disk RAID and each disk is say 300G, that is 3.6TB of space. So if ALL of your ESX servers point to the same RAID array, it doesn't matter how many LUNs you have they still all SHARE the SAME RAID (disk spindles). And if you didn't setup the RAID, and some Network Administrator or whoever did the RAID, they probably gave you shared space with some other system like UNIX, which uses LOTS of I/O. So that could be why it's very slow.

The disks used in your configuration is not only competing with each other (5 ESX servers) but you could be sharing I/O with other systems or LUNS. That's why it's slow.

Even 1 VM running can be very slow if you don't have exclusive access to the LUN/RAID.

wakesec
Contributor
Contributor

OK, that makes sense...

It is curious that when I shut down all other ESX servers and only have one esx server and one VM running I am still experiencing slow disk IO.

0 Kudos
moneill
Enthusiast
Enthusiast

How are you measuring your disk I/O?

Are your disk LUNs aligned?

Are your windows disk partitions aligned?

http://www.vmware.com/pdf/esx3/partition_align.pdf

Mike

wakesec
Contributor
Contributor

I would guess that my LUNs are not aligned since we do not have Virtual Center, only the ESX servers.

I had been measuring disk IO with 'drivespeedchecker'. It gives times for reads and writes of different sizes of files and directories.

Also, here's the address if anyone else is interested: http://www.vmware.com/pdf/esx3_partition_align.pdf

It seems as though I might have to blow my VMs away in order to align my current system, so if anyone else has data showing that this will help it would be appreciated.

0 Kudos
moneill
Enthusiast
Enthusiast

Before going that far it might be worth checking with iometer, to rule out any issues with the test tool.

Search these forums and you will see a lot of posts about measuring with iometer.

Mike

0 Kudos
wakesec
Contributor
Contributor

Latest:

As a test, I made a separate 300GB LUN and used the fdisk commands specified in the alignment document. Then I made a Windows VM and gave it two separate Virtual HDs. On one of the HDs I used the 'diskpart' utility to give it the 32k block size. I tested the IO speeds again and I still got about 11M/sec writing and 12M/sec reading.

On a normal machine (no Virtual anything) I am seeing around 15M/sec writing and 50M/sec reading.

0 Kudos