VMware Cloud Community
MC_jeffhoward00
Contributor
Contributor

RAID Performance Options

Hello -

I'll try to keep this as short and sweet as possible.  We have 2x hypervisors hosting 4x virtual SQL Servers per physical hypervisors, so:

VMHOST01 - SERVER01 VMHOST02 - SERVER05

VMHOST01 - SERVER02 VMHOST02 - SERVER06

VMHOST01 - SERVER03 VMHOST02 - SERVER07

VMHOST01 - SERVER04 VMHOST02 - SERVER08

We chose to partition our MD3200 storage array similar to how we would have partitioned it if these were physical servers, so we have 48 disks in our MD3200, which yields 8 RAID-5's of 6 disks a piece.  Pretty simple so far.  But we're seeing some weird performance, and I wonder if it's due to the high number of small RAID volumes.  Keep in mind that the MD3200 is a pretty entry-level dual-controller SAS 2.0 SAN.

Using simpleton logic, one would think the RAID controllers in the SAN wouldn't care whether it was 48 disks split 12 ways, or 48 disk split 4 ways, but I'm beginning to think there's a performance implication. Does anyone have any experience will Dell storage to know if we might see better performance in just cutting 4x RAID-5 volumes, then just having the virtual disks "share" the same VMFS volume?

We thought the RAID controller perform better if it was aware of the "sharing" (i.e. because it was managing each individual LUN), but now I'm starting to think that the RAID controller is getting overwhelmed by the number number of small volumes, and maybe VMWare would do a better job managing the sharing if the VMFS volumes between VMs.

Keep in mind these are SQL Servers with heavy I/O.  I would test it out, but with the volumes actively being used and more than 50% full, it's next to impossible to rejigger everything around to free up two full RAID volumes to combine.

Thanks,

- Jeff

Reply
0 Kudos
6 Replies
AlexsYB
Enthusiast
Enthusiast

let me preface by saying I am no storage expert.  I dable a bit but

you have 8 x 6 disk raid 5

so you have 8 x 1 raid5 parity disks (roughly), you can loose only 1 disk from each array before you are close to trouble ...

why not move to

1 big raid 6 ( with 1 or 2 hot spares).

leave you with 2 parity disks and 6 spare disks (2 of which you can assign to hot spare).

This way you have 48 - 2 -2 ( total spindles - 2 for raid -2 for hot spare ) = 44

44 spindles to handle all the io for all the sql servers.

where as you had

6 - 1 ( 6 disks in lun - 1 for parity) 5 spindles for each sql server.

BUT you might just have too much io for the setupp you have Smiley Happy  im not sure

there is more risk, if you loose 3 drives within the recovery period your hosed... on all the sql servers.

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee

Hello Jeff,

              We're seeing some weird performance.Can you please explain on what sort of peroformance issues you are having?

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
MC_jeffhoward00
Contributor
Contributor

Here's one example of wierd performance.  I have another, but I'll limit it to this one to start off:

SAN Controller 01 - RAID DATA-01 - 100% reads from this volume. It's performing well (600MB/sec reads). Remember this is an independent RAID-5 volume on the SAN mapped to an independt VMFS vol.

SAN Controller 02 - RAID DATA-02 - 100% reads from this volume. It's performing well (600MB/sec reads).  Independent RAID and VMFS vol, same as above.

SAN Controller 01 - RAID DATA-03 - Assume we're doing 100% reads from this volume. This traffic starts @ 300MB/sec, and DATA-01 drops from 600MB/sec -> 300MB/sec. Independent RAID and VMFS vol, same as above.

SAN Controller 02 - RAID DATA-04 - Assume we're doing 100% reads from this volume. This traffic starts @ 300MB/sec, and DATA-02 drops from 600MB/sec -> 300MB/sec. Independent RAID and VMFS vol, same as above.

So it's clear that when they're "sharing" the same controller on the SAN, the 600MB/sec thoughout is basically being "shared".  According to Dell, the MD3200 is suppose to be capable of 2000MB/sec total aggregate sequenial reads, so I would assume that means roughly 1000MB/sec per controller.

Do you think it's possible that the controller having to deal with the parity from multiple RAID volumes is what's killing our performance?  As AlexsYB suggested, has anyone seem performance on lower-end RAID controllers go UP by consolidating LUNs?  For example:

DATA-01 and DATA-02 are each independent 6-disk RAID-5's.  If we did a 12-disk RAID-6 with two parity drives instead of two separate RAID-5's, do you think that would perform better? My only hesitation there is that the RAID controller definitely splits the traffic -exactly- 1:1 between the volumes.  In pre-production, we've seen a couple examples of VMs sharing the same LUN where one VM will push 250MB/sec while the other only pushes 150MB/sec, with the exact same SQLIO traffic.

I'm not sure if that was a fluke, but I'd like to hear peoples opinions on sharing VMWare VMFS volumes across multiple VMs that have heavy data I/O.  The "old school" storage engineer in me wants to segment all the heavy I/O at the SAN level, but maybe that's not the best way using hypervisors.  I'm definitely open to other configurations.

Thanks,

- Jeff

Reply
0 Kudos
Josh26
Virtuoso
Virtuoso

The golden rule is "more spindles = faster". Larger RAID drives will always perform better.

You are also quite correct about the "entry level" of the MD3200 SAS.

What type/speed of disks have you got there?

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee

Hello Jeff,

              Appreciate your detailed explanation.I wont be able to comment too much on MD3200.However there are certain areas where in can share some updates.

"If we did a 12-disk RAID-6 with two parity drives instead of two separate RAID-5's, do you think that would perform better? My only hesitation there is that the RAID controller definitely splits the traffic -exactly- 1:1 between the volumes".

1.How good is the data transfer from Controller to Cache and Cache to Disk? Is there a delay no matter what raid level we are using?

"I'd like to hear peoples opinions on sharing VMWare VMFS volumes across multiple VMs that have heavy data I/O"

I'm pretty sure you might have checked and configured MD3200 as per the best practices.Sharing VMFS volumes across multiple VMs that have heavy data I/O is not a good method.I would request you to check below mentioned points

a)How big to make LUNs for a given VMFS Volume?

b)Should we isolate storage for VMs or share a consolidated pool of storage?
c)Should we use RDMs or VMFS volumes?
d)Should we use disk spanning and if so, any concerns or suggestions?
e)How much of a clustered volume manager is VMFS in terms of discovery?

Pls refer >http://www.vmware.com/pdf/vmfs-best-practices-wp.pdf

Also config maximum guide for respective ESX/ESXI version (Assuming you are at 5.0 or 5.1 Smiley Happy )

https://www.vmware.com/pdf/vsphere5/r50/vsphere-50-configuration-maximums.pdf

https://www.vmware.com/pdf/vsphere5/r51/vsphere-51-configuration-maximums.pdf

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
MC_jeffhoward00
Contributor
Contributor

Hi guys -

Thank you for the replies, sorry for going dark on this last week.  After reading the links to best-practices (thank you to those who posted), it looks like it's not a good idea to pipe heavy SQL IO through a single VMFS volume -however- there are advantages in having more spindles avaiable for a single Virtual Server's access.

So the configuration we plan to deploy and test with our next storage purchase will be the following:

Instead of splitting final VMFS volumes all the way down to RAID-5 disk groups at the SAN level, I'm going to take Sreec's recommendation (confirmed by serveral other docs from VMWare and Dell) to use one large "disk group" on the SAN of 24 disks, then use the disk group partitioning tools built into the SAN to present the single RAID-6 24-disk group as different VMFS volumes for each Virtual SQL Server.

Here's the advantage to this configuration:

You leave the management of multiple concurrent I/Os to the SAN hardware - Everything I've read says to off-load as much of this work to the SAN hardware as opposed to the VMWare managing it via multiple VMDKs on the same VMFS volume.

The other -huge- advantage is when there aren't a ton of multiple current I/Os, a single SQL Virtual Server can pull from all 24 spindles.  In terms of overall performance, in the worst-case multiple I/O scenario it should be exactly the same as the previous configuration, but with a single virtual server has exclusive access, there should be a big performance gain.

Thanks again for all the input.

- Jeff

Reply
0 Kudos