VMware Cloud Community
kpinkpin
Contributor
Contributor
Jump to solution

RAID 5 OK or Should I Always Go With RAID 10?

I have 2 DELL r710 host servers with esxi 6. The first server was configured with RAID 5 due to needing the hard drive space. I now have 6 900GB 10k SAS disks to go into the 2nd host server. Is there really a noticeable difference in performance going with a RAID 10 that will leave me with 2.7TBs of disk space? RAID 5 gives me 4.5TBs of disk space which is a pretty big difference.

0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

Since about 6 years I offer VMFS-recovery-services. Because of that I see a lot of vsphere-environments of various sizes.
Let me sum up the most important lessons I learned by visiting those environments.
The chance to run into serious problems while using vSphere is quite small in very, very small environments -  environments that are large enough to implement replication and daily automated backups also run a very small risk.
The highest chance to run into serious problems while using vsphere goes to environments with a handful of ESXi hosts that use local storage.
So  environments like the one you have run the highest risk.
Why is that ?
My theory: - Neither previous experiences with running traditional Windows-servers nor VMware Best Practice documentation really applies to typical small setups.Tips for selecting a good RAID-level usually only use 2 parameters: amount of effective storage-place versus performance.
So far I have not seen any RAID-level discussion that also talks about the parameter "survival rate"Discussions and Blogs also rarely use the "survival rate" parameter in context with the VMFS-filesystem. VMFS is considered to be rockstable and "enterprise-class"
Most new VMware-admins transfer their experience with NTFS to VMFS and are very surprised about the tips I have to offer after a recovery .... 
When admins read best practice tips or the VMware design suggestions they assume this tips apply to all scales of vSphere-environments.
This is a very dangerous assumption if you run a small environment with local disks - better be prepared for some big surprises.Let me sum up my exeriences after 5 years of doing VMware-recovery.
VMFS is rockstable : this is correct if you configure your VMs with eagerzeroed-thick provisioned vmdks only and do not use snapshots.
VMFS is "enterprise-class" software : - this only applies to environments that are large enough to implement  regular automatic backups plus replication.
Thin-provisioned vmdks and snapshots are as safe as normal vmdks: this is a very dangerous assumption. VMFS-datastores that use a large amount of vmdks that change their allocation during normal use do not handle unexpected power failures well.
A datastore with static vmdks will survive a powerfailure similar to a Windows-server running on bare metal using NTFS.
A datastore with lots of thin provisioned vmdks and snapshots can lose all its content after a single powerfailure - this is something that is completely unexpected by 99% of the admins of small environments.
RAID-level 5 is a good compromise of space vs performance. It sure is better than using single disks without RAID.  - Allmost everybody would agree to that I think but I learned it the hard way and can not say it applies to the combination: small environment + RAID5  + VMFS.
Instead I tell all my customers that run a small environment:
Never even consider to use RAID5 for a VMFS-datastore. !!!
I would even claim that using thick provisioned vmdks on single disks is a safer choice.
Raid5 is only acceptable if your environment is large enough to have replication.
In 2016 I have seen about one small environment per week that had severe RAID5 related problems. Typical problems are:

- datastore appears blank after a powerfailure
- after exchanging a disk marked as faulty the RAID-rebuild was a complete desaster and all ddata seems lost
- even service-crews of raidcontroller vendors misconfigure the RAID5 config after problems that looked harmless
I have more customers with severe dataloss using RAID5 then I have with all other possible configs summed up.
To add to that ... the type of corruption you get after a RAID5 rebuild can be really really ugly.

So let me sum it up: IMHO  running a small environment with locaL disks + VMFS + vSphere 5 and higher + thin-provisioned disks + RAID5 + lack of emergency powersupply is a completely unacceptable risk.
Do the math yourself - if you add the "survivalrate" while deciding the free space vs performance question - then RAID10 or RAID1 is the only option that makes sense.
Ulli
My view is biased - if you run a medium or large environment this tips dont apply to you.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

0 Kudos
4 Replies
vcallaway
Enthusiast
Enthusiast
Jump to solution

Really depends on your use case. Generally, RAID 10 has better read/write performance than RAID 5 since it doesn't need to manage parity. However, that performance gain isn't enough in my opinion to warrant leaving almost 2TB's on the table.

Basically, if IOPS are a concern go RAID 10. If storage space is a concern go RAID 5.

0 Kudos
GreatWhiteTec
VMware Employee
VMware Employee
Jump to solution

If you can afford it, I would go with R10. Good mix of performance and redundancy. R5 has high perf degradation during partial failures. But it all depends on what you are looking for. If you need high redundancy then R10, if you need space, then you already know the answer  

Also it depends on your workloads. If you are running critical applications, then you may want to consider R10.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Since about 6 years I offer VMFS-recovery-services. Because of that I see a lot of vsphere-environments of various sizes.
Let me sum up the most important lessons I learned by visiting those environments.
The chance to run into serious problems while using vSphere is quite small in very, very small environments -  environments that are large enough to implement replication and daily automated backups also run a very small risk.
The highest chance to run into serious problems while using vsphere goes to environments with a handful of ESXi hosts that use local storage.
So  environments like the one you have run the highest risk.
Why is that ?
My theory: - Neither previous experiences with running traditional Windows-servers nor VMware Best Practice documentation really applies to typical small setups.Tips for selecting a good RAID-level usually only use 2 parameters: amount of effective storage-place versus performance.
So far I have not seen any RAID-level discussion that also talks about the parameter "survival rate"Discussions and Blogs also rarely use the "survival rate" parameter in context with the VMFS-filesystem. VMFS is considered to be rockstable and "enterprise-class"
Most new VMware-admins transfer their experience with NTFS to VMFS and are very surprised about the tips I have to offer after a recovery .... 
When admins read best practice tips or the VMware design suggestions they assume this tips apply to all scales of vSphere-environments.
This is a very dangerous assumption if you run a small environment with local disks - better be prepared for some big surprises.Let me sum up my exeriences after 5 years of doing VMware-recovery.
VMFS is rockstable : this is correct if you configure your VMs with eagerzeroed-thick provisioned vmdks only and do not use snapshots.
VMFS is "enterprise-class" software : - this only applies to environments that are large enough to implement  regular automatic backups plus replication.
Thin-provisioned vmdks and snapshots are as safe as normal vmdks: this is a very dangerous assumption. VMFS-datastores that use a large amount of vmdks that change their allocation during normal use do not handle unexpected power failures well.
A datastore with static vmdks will survive a powerfailure similar to a Windows-server running on bare metal using NTFS.
A datastore with lots of thin provisioned vmdks and snapshots can lose all its content after a single powerfailure - this is something that is completely unexpected by 99% of the admins of small environments.
RAID-level 5 is a good compromise of space vs performance. It sure is better than using single disks without RAID.  - Allmost everybody would agree to that I think but I learned it the hard way and can not say it applies to the combination: small environment + RAID5  + VMFS.
Instead I tell all my customers that run a small environment:
Never even consider to use RAID5 for a VMFS-datastore. !!!
I would even claim that using thick provisioned vmdks on single disks is a safer choice.
Raid5 is only acceptable if your environment is large enough to have replication.
In 2016 I have seen about one small environment per week that had severe RAID5 related problems. Typical problems are:

- datastore appears blank after a powerfailure
- after exchanging a disk marked as faulty the RAID-rebuild was a complete desaster and all ddata seems lost
- even service-crews of raidcontroller vendors misconfigure the RAID5 config after problems that looked harmless
I have more customers with severe dataloss using RAID5 then I have with all other possible configs summed up.
To add to that ... the type of corruption you get after a RAID5 rebuild can be really really ugly.

So let me sum it up: IMHO  running a small environment with locaL disks + VMFS + vSphere 5 and higher + thin-provisioned disks + RAID5 + lack of emergency powersupply is a completely unacceptable risk.
Do the math yourself - if you add the "survivalrate" while deciding the free space vs performance question - then RAID10 or RAID1 is the only option that makes sense.
Ulli
My view is biased - if you run a medium or large environment this tips dont apply to you.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
kpinkpin
Contributor
Contributor
Jump to solution

Thank you all for your input.

continuumcontinuum I'm glad you took the time to write that response and go into detail. It pretty much confirmed a few things that I always wasn't sure about.

We are a small business but I do have everything virtualized on these 2 esxi hosts. I run nightly backups of every VM and the hardware is on a hefty battery backup.


I went with a 6x900GB RAID 10. I don't know if it's just in my head or what but the Servers I have up on this 2nd host seem to be noticeably snappier. I didn't think I would notice a difference on something that simple? I will probably be going the same route on our 1st esxi host as well. I think I can migrate the couple VMs that must be up 24/7 to the second host and then rebuild the disks using RAID 10. I've read some not so good things the last of couple days regarding RAID 5 and I do not want to go down that road. I feel crazy for even considering it now after I've done my research.

0 Kudos