VMware Cloud Community
RParker
Immortal
Immortal

Larger LUN = More Disks = More Performance

OK, I know this has been beat to death on other forums, and everyone is probably tired of this discussion, but I do have 1 point to bring up.

I was reading on Netapp that adding more disks to a Volume / LUN you can improve performance. This is simple math, and others have even concluded that Christian Z's thread about IOPS is directly related to number of disks in an array.

14 drives > 6 drives (which is why most RAIDS) are not as good in performance locally, because they lack SAN breadth.

Now I undertand all of this, and I have been testing over these past few months, and as I said I saw on a Netapp thread that they have seen 20% increase in IOPS from 14 drives to 56 in an Aggregate. Why would Neapp make such a claim, if this didn't promote their product? My favorite line: "Numbers don't lie".

So if this is true, and we are all looking for that holy grail of performance vs reliability, and therefore probability of failure, would it not be more efficient to use more disks?

Yes, adding more disks improves the chance that there WILL be a bad drive in your LUN, because if your SAN has 400 disks, and you have 8 Aggregates of 50 drives each, that's a 1 in 8 chance your array will be affected. OK. However, it also reduces the stress placed on each drive because their load is less. Utilizing DP (Double Parity) and with better technology, can we just realize that since we aren't backing up the LUN, we are only going to backup individual files, so where is the risk of putting ALL the VM's on a single LUN?

Not to mention the performance you would get.

There must be some formula or graph that would show either a continuing up slope (or down) of disk speed + IOPS = better performance on large LUNS.

Here is an example. 6 drives = some performance. It's limited to overall IOPS, increase spindle speed to 15K adds like another 150 IOPS per drive (check my numbers).

Now 14 drives = some performance, but each drive is less stressed and therefore able to handle the requests faster and easier, and even though that 14 drive array is ultimately more load because there are more VM's, the load is spread accross more drives. There must be a benefit to using more drives in the long run and it would outweigh the risk factor of losing a LUN. There has to be. I am not convinced this is a risky proposition at all. In fact all my years working with computers, more drives = better reliablity not less, especially when you consider depending on the size of the drives, it doesn't take that long to rebuild a drive, maybe 15 minutes.

And if we backup regularly, the VM's can be restored.

OR is the larger problem that in the case that the LUN \*IS* destroyed that restoring those VM's (perhaps hundreds) would take an eternity? Even if that were to happen, you can restore higher priority VM's in order... you don't have to wait for the entire LUN to be restored.

Let's be realistic, has this EVER happened to anyone? How many people have actually lost a LUN due to mechanical fault? Not human error, drive failure? I am willing to bet that if anyone out there has a brand name, well respected SAN solution, that they have minimal (if any) problems.

We seem to all be living in the dark ages, still of that same mindset of the mainframe, and drives that were small, and we were required to partition drives because of FS limitations, and we aren't progressing in technology. That's my view.

Are we \*SURE* that utilizing a LARGER LUN across more drives and ultimately less dependence per drive is not the better option?

Just to clarify, I am not talking about simply rebuilding a LUN on 500 gig drives. I am saying leverage MORE drives in the Volume / LUN to get the space, not individual drive size.

Surely there is paper on this to prove mathematically (because that's all this is) that this isn't better? I am guessing there is support to the contrary and therefore supports my claim that a Larger, more economical LUN = VM utopia.

Reply
0 Kudos
17 Replies
pauliew1978
Enthusiast
Enthusiast

This is a question that I am particulalry interested in. I have chopped and changed in my test environment between having one large raid set and one large lun (10 disks). Ok, that may not be huge to some people but it is way bigger than I am used to!. I definately agree that the performance does increase failry radically going from a 4 disk raid 10 to a 10 disk raid 10. However there are other reasons which might make you build your array differently. It is my understanding that you might want to split your intensive vms on to different raid sets to help overall performance of the vms running on your array. Another reason like you say is backups. If you use esxxpress to do your backups it helps to have two data stores or more so you can backup luns to luns (and restoring the vms takes the load off restoring a vm from a lun to the same lun). I have created my set up so that 1lun is equal to one raid set. I believe you can have multiple luns on one raid set no? but during my setting up and learning process I read initially that some apps can use the lun alot and make other vms access to the lun slow. In order to combat this I decided to create different raid sets for certain vms. Also when you thing of rebuilding your raid sets or, for instance replication between two san arrays, you only have to recover one raid set and affect vms on that raid set rather than affect your entire infrastructure while recovery between the sans takes place.

just my two cents.

am I right in thinking you can have multiple vmfs volumes per lun? it might well have been another option for me however things seem to be working nicely the way i have set it up. However, its not like we are using large numbers of vms or intensive db apps. One app we have is a time and attendance sql db ( we have a lot of manual workers who swipe in and out 24 hours a day) so it will constantly be hitting the san all day every day. My thoughts were to try and segregate this vm on a separate raid set to ensure it doesnt affect other vms too much. However, if I had split the lun into mutliple vmfs volumes I don't think this would have been an issue?

Reply
0 Kudos
davidbarclay
Virtuoso
Virtuoso

How many people have actually lost a LUN due to mechanical

fault? Not human error, drive failure?

A few times actually. Bad batches of HDDs do occur from time to time. I have seen 10+ disks in an array fail within DAYS. Not pretty Smiley Sad

because of FS limitations

Don't we still have these Smiley Happy

Dave

Reply
0 Kudos
davidbarclay
Virtuoso
Virtuoso

Larger, more economical LUN = VM utopia.

What exactly are we talking about when we say large? I know most implementations I work on, we stick with between 300-600GB per VMFS (per LUN).

From the mathematical view, I agree with what you are saying. In practice though, what exactly are you expecting? A 2TB VMFS over lots of spindles? What would that achieve? What about the overheads, LUN locking, etc?

I too got caught up in the maths, then realised I was splitting hairs. Now I just measure what's needed - then ensure I provision for that and expected growth.

Dave

Reply
0 Kudos
davidbarclay
Virtuoso
Virtuoso

-

Message was edited by:

davidbarclay whoops, double-post

Reply
0 Kudos
TomP1
Contributor
Contributor

I have experienced multiple drive failures in an array on a brand name disk array. It was not pretty. I don't think it happens very often but I never want to be there again.

As for the over all question of Larger LUN size leading to more performance, it's a good question. It's one I ask my Storage Engineers all the time. Basically, they tell me in our SAN we get best performance at around 250gb LUNs. Anything above 600gb is going to be our least IOPS but is still screaming fast because they are so good and buy the best stuff--at least that's what they tell me. And I haven't seen any evidence to the contrary...yet.

Reply
0 Kudos
RParker
Immortal
Immortal

< Storage Engineers all the time. Basically, they tell me in our SAN we get best performance at around 250gb LUNs. >

That's based upon what? Where are the impirical numbers for this?

< Anything above 600gb is going to be our least IOPS >

Ask them to SHOW you the data, I want to see it and so do you. I think they are just "guessing". Netapp contradicts this info by \*PROVING* that 56 drives in an array is (73gb 15k) is 20% \*FASTER* than the same drives in a 14 drive array. This is setup as a single RAID 4 ALL 56 disks.

So I ask you, where do they believe that adding more drives to the array actually is a diminishing return? They don't know.. that's my point.

I think it's high time we rethink this strategy, because I believe they are using \*OLD* data, not current technology, and they are too lazy to actually TEST it.

Tell them that, and you can tell them \*I* said so. You want me email? I will go toe to toe with them and \*MAKE* them put their money where the mouth is.

Reply
0 Kudos
RParker
Immortal
Immortal

Large as in many disk drives. Size isn't important. We can take a series of 73g / 15K drives and put them in array. LARGE as in many drives = more IOPS = better performance. That's LARGE.

What is your current setup? How do you arrive at your 600g LUN? These are questions you should be asking, because if they are simply 146g drives, we are only talking 9 of them. Give me 73G (20 of them) and I will bet a dollar to a donut the speed (because more IOPS) will be even faster, utilizing SAME SPACe.

Maybe they simply setup you up on 3 300G drives.. that's not good, maybe the performance is just adequate, and compared to a local drives its fast, but do you \*REALLY* know if it's the fastest you can get? Probably not.

SPACE we could care less, we are after performance. You can give me a bunch of 36g drives I don't care, more drives is better. That's what I intend to prove.

Reply
0 Kudos
RParker
Immortal
Immortal

because of FS limitations

Don't we still have these

Yes we do, but they \*FAR* surpass 600 G, wtf is that? I can buy a 600G drive, big dippy do.

NTFS, LUNS, VMFS can exceed MULTIPLE TERRABYTES. That's what I am talking about. 10, 20, 30 TB of space, all comprising hundreds of disks, now that's speed.

Reply
0 Kudos
bastop
Contributor
Contributor

We have a netapp environment and have recently re-evaluated how to best carve up some new storage to present luns in a fast and efficient manner.

It is true LUNs can exceed multiple terabytes, and thus require more disk, and may improve speed. However we carved up a 1 TB lun early on, and now both Netapp and VMWare are saying smaller luns are better. This is documented with a VM health check.

Your point about having more disks is valid. The more disks, the better the speed. However making your LUN size equivalent to the total capacity of the disks will not yield the fastest speed.

Say you want a 300 gig LUN presented to your vm server. That is two 146 gig drives, and in a Netapp world, it would be a four disk aggregate (2 disks burned for parity). If you wanted to improve the IOPS possibilities of that LUN, you could add more disks to the aggregate, without growing the LUN or adding more LUNs into the aggregate. So if you wanted a really fast 300 gig lun, you could carve up a 10 disk (or more) aggregate (and however you want to do the raid sets).

For our purposes, we wanted to serve up about 1.5 TB of vmdks. We carved up a 2.5 TB aggregate, and the max lun size is 250 gb. So we will present 6 luns. The backend is 146 gb drives, and the raid group size is 16 (Netapps "sweet spot").

There are three main reasons behind why we carved the storage this way:

1. With Netapp's aggregate technology, you have the benefit of having a very large aggregate containing all your data, and being able to spread out your IO across many many disks, weighed against using smaller aggregates (and thus less spindles) to compartmentalize the data to protect against highly intensive VMs. Also you burn two disks for parity for every aggregate you create. Netapp's suggestion was to leverage the amount of spindles in the larger aggregate.

2. Netapp recommended between 300 - 600 gb luns, and VMWare recommended 250gb - 500gb luns. We decided to go with the smaller value. VMWare's contention was the larger lun sizes, ie our 1 TB lun, actually caused higher latency in our environment. When there are 60 guest servers hosted on a single lun, the read latency is increased. If one server is seeking data on the lun, there is latency if another server wants to seek / write data at the same time. Maybe minimal latency, but more than if the luns were split up.

Finally, with multiple LUNs you can optimize your use of multiple paths to the SAN. Say you have two single port HBAs. I could assign LUNs 1-3 to one path, and LUNs 4-6 to the other path.

Reply
0 Kudos
RParker
Immortal
Immortal

Now see that's what I am talking about, real data to back up these claims.

So latency is the problem. I knew there had to be something, but no one knew the answer. I grow tired of people and their "..but we have always done it this way, and we have not had problems...". That's lovely, but does it mean it can't beyond that? Testing is the key to this answer.

I am glad you posted this, this is good info. Thanks.

I too run benchmarks, almost daily, everyone thinks I am like the VM Ware geek now because I spend so much time tweaking and testing, and moving, if I am obsessed, so be it. But I am never satisfied. I won't quit until I find my "sweet spot" either.

16 drives is the sweet spot eh? So when you create a volume, it still stripes the data across \*ALL* the drives in the Aggregate? That's what I need to know.

If this is true, then I will leave it be.

I will still run my benchmarks to prove this of course, but hey when I get my answer I will be sure and include you. thanks very much for your input.

Reply
0 Kudos
bastop
Contributor
Contributor

The lun latency is only one portion of the issue.

We carved a 1 tb lun, with a 1 tb datastore. Having all 60 virtual machines send their io requests to the one lun is inefficient. The same can be said of serving one large lun to a windows client. Multiple luns is more efficient in that scenario as well.

Essentially the point I'm trying to drive is there are two general ways to carve storage - capacity-centric and performance-centric.

Capacity-centric view says if you have an 11 TB aggregate (Netapp's max aggregate size) you could use almost all the space, and have a large amount of capacity, with mediocre performance.

Performance-centric view says if you have an 11 TB aggregate, you could use only a few TB and experience a performance improvement.

There are many Netapp reports that document the practice of what I call "throwing away spindles" in exchange for better performance. Most of these documents pertain to Exchange and SQL sizing though.

davidbarclay
Virtuoso
Virtuoso

>VMFS can exceed MULTIPLE TERRABYTES

If you are talking about extents, that's another thread in itself. 2TB is the maximum size of a (single) VMFS.

Dave

Reply
0 Kudos
davidbarclay
Virtuoso
Virtuoso

What is your current setup? How do you arrive at your 600g LUN? These are

questions you should be asking,

I don't need to ask, I provision it Smiley Happy

It always depends on the storage vendor. We mostly sell HP EVA and IBM DS4/6/8XXX. Each vendor/array have there own recommendations around capacity/performance/redundancy - I don't think they should be ignored (perhaps not 100% followed, but certainly not ignored).

Even on the HP EVA which has "virtual" raid, they still don't recommend extremely large disk groups (array in other vendor terms). Is that purely a risk decision? Who knows - but I can't ignore the advise from the vendor who has tens of thousands of installations (compared to hundreds of ours).

Dave

Reply
0 Kudos
julianwood
Enthusiast
Enthusiast

How does this all work with being able to manage multiple VMs and recover them in a remote site?

We are using NetApp storage over iSCSI for our SAN infrastructure.

We have all our disks in one aggregate (2 shelves of 144Gb 15K disks with the associated parity disks and 2 spare)

So all IO will be spread across 26 disks

We are thinking of created a volume and qtree per VM with a LUN per VM disk file (So C: and 😧 as separate virtual disks and separate LUNs within one volume)

This allows the VM to be treated as a single unit for snapshots and snapmirrorrs and still spreads the IOPS across all 26 disks.

Why are the suggestions to have LUNs so big when you take take advantage of aggregates? If you have 20 VMs in a say 250Gb LUN you can only snapshot and snapmirror the whole LUN and can't recover a single VM.

or am I confused????

http://WoodITWork.com
Reply
0 Kudos
bastop
Contributor
Contributor

Although we haven't implemented it here, snapshotting a large lun for recovery purposes would involve mounting the snapshot (lun cloning it probably) to a vmware player machine, and copying over the vmdk from the mounted snapshot. Messy but doable.

Reply
0 Kudos
joe_cruz
Contributor
Contributor

From a pure performance perspective, you are spot-on: more disks = more performance.

In our shop, we routinely deploy ~400GB LUNs spread across 73GB Disks (i'd do the math to figure out how many spindles that is, but i've had a long day).

The reason we don't go larger (especially in our 146GB HDD-laden drive) is that if there is a drive failure, we have to weight how long it takes for the controllers to bring a hotspare online. If you have a a very large array, this could take a considerable amount of time (unless the NetApp appliances somehow do it differently than the IBM DS4800 we're running).

So if a 2TB array takes, say, 48 hours to bring a hot-spare online (what IBM calls a CopyBack operation), that's 48 hours where your array is vulnerable to complete failure if another drive fails. For us, that's too much risk.

Reply
0 Kudos
WestNab
Contributor
Contributor

This article confuses LUN size with number of spindles of the RAID array the LUN is based on. You could have a small, high performance LUN spanning scores of discs. Yes, the more spindles in your striped RAID the faster. But you can then create lots of small LUNs on that array?

Reply
0 Kudos