VMware Cloud Community
iguy
Enthusiast
Enthusiast

New architecture Design idea - 1 LUN per VM

In discussions for the VI3 deployment and architecture, an idea has been brought forth for a "better" architecture design. We have HDS Frames and use 2 Emulex cards per Host.

The idea is to have 1 VM per LUN.

Some of the design assumptions around this

a) Everything for that VM is on that LUN and is on a Fiber Channel connection.

b) The LUN is VMFS formatted.

c) To get us the higher density and the functionality of DRS/HA we would have a host group of say 4 hosts that have 128 LUNs assigned to them due to limits of VMware. This limit is being able to handle no more than 128 LUNs on a single host with 2 HBAs. This means we could get about 31 Virtual Machines per Host average.

d) LUN Queue Depth would be set to 2 per HDS standards for physicals. In theory based on what we've read and been told then the DiskNumSchedule parameter in the vmkernel wouldn't cause a performance slowdown because there is no contention for the same LUN by different VMs.

e) Our Storage Processor Fiber Adapter has a Queue depth of 1024. As such we would have 4 Hosts on this specific Fiber Adapter. The math would be

LUN Queue Depth of 2 * 128 LUNs = 256 Queue Requests per Host

256 Requests per Host * 4 Hosts = 1024 queue requests maximum generated

Some questions I have about this approach.

1) Is there anyone out there that is doing something similar with the same higher density? I understand that some companies is doing this but they do not have nearly the density as their design ideal was to put 1 VM per physical CPU (for all single CPU VMs) on Dual Proc boxes. So the number of LUNs seen by a single host or DRS cluster isnt' an issue. They have something like 8 Hosts with 2 LUNs each going into a single Storage Processor Fiber Adapter Port which has a Queue Depth of 1024.

2) Is there any of these design assumptions that are flat out wrong or that we are viewing incorrectly?

3) Are there any Best Practices that outline items like this?

Reply
0 Kudos
19 Replies
kix1979
Immortal
Immortal

Why? You are going to spend more time managing LUNS then you will your VMs. In almost every case of people doing this, they revert to larger volumes. Also keep in mind you will waste a TON of space, you need to have swap/snapshot space for each LUN, so you lose the benefit of sharing that free space.

Thomas H. Bryant III
Reply
0 Kudos
mreferre
Champion
Champion

I echo Thomas' response. Why would you want to do that xxxxxx thing ? What is the advantage ? Some people are doing things like this with RDM's (and they have a point ..... although I don't like it either) but why would you want to create 1 lun = 1 vm scenario with LUNs formatted VMFS ?

My opinion is here:

http://it20.info/blogs/main/archive/2007/03/11/4.aspx

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
iguy
Enthusiast
Enthusiast

One of the main advantages is the ability to replicate that given LUN using hardware replication between two data centers. In our case it would be using HDS TrueCopy to replicate that given LUN.

Now if we replicate a LUN that has multiple VMs on it, in order to fail that LUN over to the other data center we need to failover all the VMs that are on that LUN. Instead if we have a 1VM to 1LUN mentality we could replicate all the LUNs and only failover specific VMs at a time.

There is a lot of discussion around this and other ways to potentially deliver the same solution but at the moment I'd like to keep this thread focused on this idea and how does it work/doesn't work, thoughts, experiences etc.

Reply
0 Kudos
doubleH
Expert
Expert

i was actually thinking of doing a similar thing. i was thinking of separating my critical and non-critical vm's in separate luns so that i could replicate my critical lun to an offsite location. i will be using an equallogic PS100e

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
Reply
0 Kudos
mreferre
Champion
Champion

I see your point but I don't get why you would need to "failover" one vm at a time on a different datacenter.

Back to the point I guess that what I would say is: it would work .. although it would be like driving from Milan to Paris using just the first gear (out of 6). You will get there ..... although not in the most efficient way.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
kix1979
Immortal
Immortal

While hardware replication is an advantage, what is the other data center? Is it just a DR site? If so there is a lot of money tied up in a solution that only benefits in the time of a disaster. A lot of people do a mix/match solution of RDMs + HW Replication, that way they are application aware and use the SAN to replicate. The other thing is, do you really need to replicate every VM? You don't need all of your AD VM's, Web servers etc..., so you have to consider SLA's on VMs. Again you could have your mission critical VM's on single volumes/RDMs, and then all others on large volumes.

Thomas H. Bryant III
dpomeroy
Champion
Champion

I believe this is something NetApp recommends, and I have heard of people doing this but to me the additional management costs and complexities isn't worth the benefit of the DR options you gain. I think there are better overall DR solutions.

If you were only going to have a small number of VMs then this is a more manageable solution, but for a company like mine where we have 250+ VMs and 18 ESX servers I think it would just be to complex.

Reply
0 Kudos
Ken_Cline
Champion
Champion

I'm going to jump on-board here with Thomas and Massimo...but I'll say it in stronger words: I think it's a BAD IDEA. You gain almost nothing and you lose the simplicity that virtualization is supposed to bring. If you need to replicate individual VMs for DR purposes, look at solutions like esxReplicator or DoubleTake's new VM replication solution (don't remember the name) and replicate the VMs you need to.

Creating one LUN per VM introduces way too much complexity...and IMHO, I've NEVER liked pushing VMware's configuration maximums (darned horse flies - watch out, there's another one! Smiley Wink )

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
Reply
0 Kudos
doubleH
Expert
Expert

agreed, but if you have bought a san and it comes with hw based replication then you definately don't want to spend more $$$ on a separate product. i maybe over simplifying here...but maybe he could look at creating 2 luns...1 for housing replicated vm's and the other lun for vm's that do not need to be replicated.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
Reply
0 Kudos
sbeaver
Leadership
Leadership

I have 2 DMX's a 1000 miles apart with an OC12 connection between them. I have several vmfs LUNS with multiple VM's per LUN and replicating all of it with the native SDRF

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
Reply
0 Kudos
iguy
Enthusiast
Enthusiast

The thought behind this is the ability to perform a Disaster Recovery test by executing the exact steps one would take if a given data center did fail. This way we would be testing the network configuration, SAN configuration, VMware Host configuration along with all hardware involved by just "failing" over to the other data center.

Then in the case of a real event we recently tested all that given setup by running there say 30 days ago.

It also allows us to have two production level data centers running at best performance all the time because VMs are split between the data centers along with network usage and SAN hardware usage. We turn the data centers into "Active-Active" data centers.

Reply
0 Kudos
iguy
Enthusiast
Enthusiast

How do you fail over or do a test fail over of that secondary site sbeaver?

Is your replication data center just a fail over site or is it live with production machines?

Reply
0 Kudos
iguy
Enthusiast
Enthusiast

I am part of a team that manages over 100 ESX Hosts with well over 1000+ VMs. We have significant complexity on our end and are looking at if this is a manageable solution or not.

My view is that this idea is more than just a DR solution. Instead it is making optimal use of hardware that we have in both locations all the time. Turning VMware Virtual Infrastructure into more than just a passive failover scenario but something more akin to an Active-Active architecture approach.

Granted, like I stated above, I'm looking for what the forum folks believe is the good/bad to this approach. Anything you have run into on the pro/con sides of this idea. I do appreciate all the comments and discussion so far. Is there anything else you can think of that might be good/bad to doing this from a technical perspective?

Reply
0 Kudos
mreferre
Champion
Champion

There is nothing wrong in what you are saying and even this active-active idea is something that makes sense. What doesn't make sense (to me at least) and especially for your big numbers is the granularity. If you want to create an active-active scenario you can do 1000 LUN's (500 active per site) or two HUGE LUN's (1 active per site). Well not really .... but I am sure you follow me.

Perhaps you could configure say .... 50 LUN's that would host around 20 vm's each .... so your "failover unit" would be 20 vm's ...... which out of 1000 is granular enough (in my opinion). I understand that fa failover unit of 1 vm would be better ..... but if you consider all the burden in storage management......... (not to mention the fact that in your scenario new deployments are bound to the SAN layout...... i.e. if I need a new vm I need another LUN from the Storage group ....... NO WAY! as I said in my blog post.... virtualization is all about decoupling things ..... this to me is tightly coupled subsystems ..... bad thing).

My thought.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
sbeaver
Leadership
Leadership

This is an active/passive setup with the passive setup located at Sunguard. In the event of a disaster or for a test they would manually cut the OC12 and present the LUNS to the standby ESX servers waiting at Sunguard. Once the LUNs are presented I run a script to find all the vmx, register and start.

I am going to Sunguard to test this next week so I can report back. I guess the really question is how much money are you guys willing to spend as well as the distance between sites to get something like an Active / Active setup.

The company has to pick how much downtime it will find acceptable and then spend and work from there

Steve Beaver
VMware Communities User Moderator
VMware vExpert 2009 - 2020
VMware NSX vExpert - 2019 - 2020
====
Co-Author of "VMware ESX Essentials in the Virtual Data Center"
(ISBN:1420070274) from Auerbach
Come check out my blog: [www.virtualizationpractice.com/blog|http://www.virtualizationpractice.com/blog/]
Come follow me on twitter http://www.twitter.com/sbeaver

**The Cloud is a journey, not a project.**
Reply
0 Kudos
dpomeroy
Champion
Champion

Why not have several DR LUNs and put VMs that would need to be replicated on those LUNs. To me less larger LUNs would still be better, especially for the size of environment you have.

Keep us posted on what you decide, its always interesting to read about large scale deployments, especially if you go with one LUN per VM.

Reply
0 Kudos
williambishop
Expert
Expert

Holy crap batman...this would be a nightmare to configure from the storage side, at least if any significant numbers are involved.

For the two full time storage admins salaries this will cost you, you could buy another solution.

--"Non Temetis Messor."
Reply
0 Kudos
iguy
Enthusiast
Enthusiast

Thank you all for your feedback and discussion.

We have decided based on this feedback and some other customers that we talked to that we are going to stick with the regular approach of say 400G LUNs (maybe smaller based on recent conversations) with multiple VMs per LUN.

As part of this we will create a VMware Cluster/Host Group of 1 VM to 1 LUN along with testing NFS on a NetApp filer. These pilots will be done to validate the approach and see if it is really all they are cracked up to be.

Reply
0 Kudos
EshuunDara
Contributor
Contributor

Sorry to bring up an old thread, but I was just thinking about doing the same thing.  It didn't make sense when I first got into VMware a few years ago, but now that you can thin provision vmdk files...  If you have a bunch of thin provisioned LUNs formatted VMFS with a single thin provisioned VM on them, there is no wasted space.  Given that the space limitation isn't a problem anymore, can anyone offer some good reasons not to go ahead with this under vSphere4?

Reply
0 Kudos