VMware Cloud Community
petzu
Contributor
Contributor
Jump to solution

ESXI4 installation, RAID best practice, Stripe size, VD's, partitions?

Hi all,

I've Dell Poweredge PE2970 server, with PERC6/I and PERC6/E RAID controllers,

and Dell Powerwault MD1000 storage array.

PERC6/I is driving 6 x 150Gb, 10.000rpm, SATA discs. (560Gb RAID 6) and,

PERC6/E is driving 15x x 1Tb, 5400rpm, SATA discs. (12Tb RAID 6)

This combination is used to provide iSCSI and NFS services for music and movie production environment.

I've planned to create 3x100Gb, 1x200Gb, and 1x60Gb Virtual discs from 560Gb RAID 6 array.

60Gb for VMware ESXI4, and StorMagic SvSAN, installations.

100Gb for Virtual Machines (Linux, Windows, NFS, AD, Backup, etc. servers)

100Gb for iSCSI Audio (Pro Tools working disc)

100Gb for iSCSI Video (Pro Tools working disc)

200Gb for iSCSI Virtual Instruments (used by Pro tools)

and 6 x 2Tb for storage, backups, etc.

How should I create those Virtual Discs when I will create RAID arrays?

What Stripe size to use?

How about this 60Gb "system" VD for ESXI4 and SvSAN, or 100Gb "virtual machine" VD for other servers?

Should I do it like this, or should I create one 160Gb VD, for all servers and ESXI installations?

or should I create one VD to each their own?

I mean like 1Gb VD for ESXI4, 25Gb VD (two partitions 5Gb and 20Gb) for SvSAN, 80Gb VD (two partitions 40Gb and 40Gb) for Windows server,

5Gb VD for Linux NFS server, 100Gb VD for iSCSI Audio, etc. In this solution I could choose a different stripe size for each VD.

I know this is not best possible solution, and in the future I could replace all 10.000rpm discs with 32Gb fast SSD discs (128Gb RAID6)

for system and servers, and have a second MD1000 array for dedicated 10.000rpm discs to iSCSI. But at this time, this is how to go with.

Any suggestions and advices are very welcome.

Regards

Petzu

0 Kudos
1 Solution

Accepted Solutions
LucasAlbers
Expert
Expert
Jump to solution

We create a 5120 mb vd for the esxi install. 5121 actually as the perc bios rounds down.

Then we can recreate just the esxi install without touching anything else.

The vm's are going to limited in size by their maximum vmdk. So you could just create the minimum number of datastores.

Keep it simple and straightforward unless you have a specific reason to diverge.

Let me paraphrase what a vmware dev mentioned, (this was in regards to changing the vmfs default block size, and I like to think it also applied to the vmware scheduler, a super awesome piece of programming.) "we optimize it, so you can just go with the default, and know it will do the right thing."

The default stripe size is a good compromise optimized to work under most work loads; different size stripe sizes can have radically different performance characteristics based on the workload. The default works well, and the newer windows vm's 2008 and later will align propertly on 64kb.

Dell has a ton of technical documents comparing raid performance levels. Ever few months we talk about comparng raid performance and I groan.

If it needs to be super fast I pick 10, lot's of space 5, more reliable but more space then 10, 6.

The dell tech said most people do raid 5 because disk's are so reliable.

We use raid 6 for reliability on volumes larger than 12TB. Per the uncorrectable error level on a 12TB raid rebuild.

http://m.zdnet.com/blog/storage/why-raid-6-stops-working-in-2019/805 (assumes a ure of 1214, and I think enterprise drives have 1215 ure).

The battery backed raid controller cache mitigates some of the supposed performance hit of raid 6 versus raid 5.

In your case I would use raid 5 for the operational datastores for performance, and raid 6 for the backup datastore.

In addition a synthetic benchmark does not always tell you the performance you will get with a real application in an os.

When we first virtualized mysql, according to our iometer benchmarks we thought performance would be an order of magnitude worse. In practice they were good enough that we went hog wild and virtualized many more. You should still be aware of the performance characteristics of your app.

For example we have two seperate mysql replication pairs, and they each get their own 5 disk volume on the same md1000.

Mixing disparate workloads on the same volume, specifically database servers with lot's of random file io, and vm's with sustained sequential access for example will hurt the database performance. ESXI 4.1 storage io control feature is designed to mitigate this.

The funny thing about spindles and raid controllers is that, sometimes a lot of slower spindles will out perform less higher speed spindles.

You might find that you aggregate read from the md1000 out performs the faster smaller volume.

The battery learn cycle runs every 90 days or so, and will turn off the writeback cache, which will hurt performance. It needs to run because the cache battery degrades over time, and it needs to know when the batteries will last less than 24 hours. It determines this by measuring how long it takes to charge.

We've never noticed this or needed to adjust this on our farm of servers from the openmanage default, I just thought i'd mention it because we we were on a very related subject.

Install openmanage for esxi. Turn off the cache on the individual disks, as this cache is not battery backed.

Remember to document your config, as you won't remember what you did when you need to do some disaster recovery.

View solution in original post

0 Kudos
2 Replies
LucasAlbers
Expert
Expert
Jump to solution

We create a 5120 mb vd for the esxi install. 5121 actually as the perc bios rounds down.

Then we can recreate just the esxi install without touching anything else.

The vm's are going to limited in size by their maximum vmdk. So you could just create the minimum number of datastores.

Keep it simple and straightforward unless you have a specific reason to diverge.

Let me paraphrase what a vmware dev mentioned, (this was in regards to changing the vmfs default block size, and I like to think it also applied to the vmware scheduler, a super awesome piece of programming.) "we optimize it, so you can just go with the default, and know it will do the right thing."

The default stripe size is a good compromise optimized to work under most work loads; different size stripe sizes can have radically different performance characteristics based on the workload. The default works well, and the newer windows vm's 2008 and later will align propertly on 64kb.

Dell has a ton of technical documents comparing raid performance levels. Ever few months we talk about comparng raid performance and I groan.

If it needs to be super fast I pick 10, lot's of space 5, more reliable but more space then 10, 6.

The dell tech said most people do raid 5 because disk's are so reliable.

We use raid 6 for reliability on volumes larger than 12TB. Per the uncorrectable error level on a 12TB raid rebuild.

http://m.zdnet.com/blog/storage/why-raid-6-stops-working-in-2019/805 (assumes a ure of 1214, and I think enterprise drives have 1215 ure).

The battery backed raid controller cache mitigates some of the supposed performance hit of raid 6 versus raid 5.

In your case I would use raid 5 for the operational datastores for performance, and raid 6 for the backup datastore.

In addition a synthetic benchmark does not always tell you the performance you will get with a real application in an os.

When we first virtualized mysql, according to our iometer benchmarks we thought performance would be an order of magnitude worse. In practice they were good enough that we went hog wild and virtualized many more. You should still be aware of the performance characteristics of your app.

For example we have two seperate mysql replication pairs, and they each get their own 5 disk volume on the same md1000.

Mixing disparate workloads on the same volume, specifically database servers with lot's of random file io, and vm's with sustained sequential access for example will hurt the database performance. ESXI 4.1 storage io control feature is designed to mitigate this.

The funny thing about spindles and raid controllers is that, sometimes a lot of slower spindles will out perform less higher speed spindles.

You might find that you aggregate read from the md1000 out performs the faster smaller volume.

The battery learn cycle runs every 90 days or so, and will turn off the writeback cache, which will hurt performance. It needs to run because the cache battery degrades over time, and it needs to know when the batteries will last less than 24 hours. It determines this by measuring how long it takes to charge.

We've never noticed this or needed to adjust this on our farm of servers from the openmanage default, I just thought i'd mention it because we we were on a very related subject.

Install openmanage for esxi. Turn off the cache on the individual disks, as this cache is not battery backed.

Remember to document your config, as you won't remember what you did when you need to do some disaster recovery.

0 Kudos
AndreTheGiant
Immortal
Immortal
Jump to solution

Welcome to the community.

You can see other hints on this document:

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro