Re: Question - Best Practices for "Number of vmdk ...

charron · ‎03-28-2012

I have just been assigned to a new client as a VMware consultant and right from the getgo, I ran into some possible confrontational issues with a member of the team. I have always followed "20-25 vmdk per 800 GB datastore" Policy since 3.5 (with some exceptions for High Disk I/O VMs) but my new client has an architect who is very paranoid about "possible datastore corruption" and he is adament about sticking with their current "One lun/one datastore" for each VMDK policy, including creating separate LUNS, Zones and datastores for each vmdk that exist in the environment. My opinion is that it is way overkill and it will cause management nighmares in the future, both from VMware and SAN admin prespective (like when App guys suddenly ask for more disk space, or more memory etc.) plus I don't see any major advantage in terms of protection or performance of the datastore. SAN Guys are of little help as they are "tell us what you need and we'll give you" folks. He is a very senior person, has been in the company for a while and he has big clout/influence, so I would like to arm myself with as much info as possible before going into the battlezone. I would really appreciate if you could address some of the following questions for a vShpere 4.1 environment:

1). What are the major issues (current or in futue) do you see for "One Lun, one vmdk" policy?

2). What are advantage of this configuration, if any? (apart from isolating "datastore corruption" to one vmdk)

3)Is there any disk I/O performance gain/loss for following this configuration?

4)What is the industrywide accepted "sweet spot" in terms of "number of vmdks/datastore" ? Also, the accepted size of standard datastore for VMs with normal disk I/O?

kjb007 · ‎03-28-2012

With one LUN, 1 VMDK, you place several artificial limitations on your cluster size and/or host size. If you only have 1 vmdk per vm, then you have a limit of 256 virtual machines. Which may not sound that bad, but that limits your cluster size to 256 as well, as all LUNs have to be the same for the cluster. So, you end up with many more smaller clusters, which can be less efficient, in terms of failover capacity. If you have more than 1 disk, then your vm count gets smaller.

As you said also, more clusters, MANY MANY more datastores to manage, gets very difficult to keep track of when/if your environment gets large.

Consider that your storage is coming from a SAN, and all LUNs that are part of a RAID group or aggregate can potentially fall victim to the same "corruption", are you expected to account for that as well, and have only 1 VM per raid group, per array?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

charron · ‎03-28-2012

Hi KjB,

Thanks for your responce. Currently, they are not concerned about isolating VMs by arrays (thank god), but their current strategy was largely formulated after a particularly bad experience encountering "Datastore corruption" incident few years ago when they got caught in the open with very poor backup/restore strategy in place. Since then, they have put a reasonably sound DR stretagy in place. However, it seems that since that incident, they have amassed total of 640 Datastores in the environment with one cluster alone showing 146 datastores.

my plan is to steer the architecture team towards more reasonable "10 VMs (20 vmdk) per datastore(about 800GB Size)" and then if we come across any VMs with heavy IOPs, we'll move them to their separate datastore. This strategy in my opinion will provide good balance of performance and effciency in terms of managing the environment.

gravesg · ‎03-28-2012

Wait a sec, 1:1 VM to datastore? So if he has say a 40GB DHCP server as a VM, its on its own datastore?

I too have the very same concerns of VMFS corruption or even admin error. One mistake can blow away 2 dozen or more VM's if your really dense. But that is what hot/online/cdp backup solutions are here to provide. Protection. If he does not have one in place, I would too agree to limit the density per store but it has to be more than 1:1, otherwise you just keep wasting space in free space on disk or have complexity through the nose on your datastore config.

kjb007 · ‎03-29-2012

Sorry, the array comment was a poor attempt at sarcasm. If they do have a good DR strategy in place, then I would use that and suggest, instead of a direction of vmdk's or vm's per lun, the idea of a lun recoverability size instead.

How quickly can they restore something with their DR strategy, and how much data can they restore. You can use that to figure out a size, and use that for datastores. It may not be very large, possibly 800 GB as you prefer, but it will certainly be better than 1 vm.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

AndreTheGiant · ‎03-29-2012

With vSphere 5.0 you can really have few LUNs with all your VMs.

The numer is no more limited to SCSI reservation or other similar issue.

But consider also other aspects, like for example a site DR solution. If you plan to have storage replication, the LUN design will depend by the type of replication and the type of VM that you need to protect.

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

DavidKlee · ‎04-03-2012

A true one to one datastore to VMDK file is a nightmare to manage. vSphere was designed to handle multiple VMDK files on a VMFS file system. Other not-as-advanced hypervisors are still doing one to one relationships and Google around to find out how bad they can be to manage.

1) Management of the large number of datastores is going to take considerably longer to perform than if you had a reasonable number of VMDKs per datastore. Some of the major issues might not appear like major issues, but in some of my clients they are. For example, think about the waste space of the gap between the end of the VMDK file and the end of the datastore. At 1000 datastores all configured like this, you could be wasting quite a bit of space (unless you thin provision the LUNs, which is bad for performance in most cases). Doing this also effectively cuts down on tooling such as Storage vMotion (assuming the datastores are the same size as the VMDK file). You simply lose a lot of the flexibility you gain by appropriately sizing the datastores.

2) A previous post was right on. Datastore corruption is not necessarily mitigated by having a 1:1 policy. If the disk group underneath has a problem, all of the datastores could be corrupted. Outside of that, I really cannot think of any advantages of this configuration - UNLESS you are in such a high I/O environment that every VMDK file must be in its own disk pool for performance reasons. I've only run across this in one situation, and the I/O requirements were just plain ridiculous.

3) I don't believe that any performance gain or penalty would exist from this configuration in normal day to day activities. A datastore rescan on a host would probably take a lot longer though.

4) The industry standard for number of VMDKs per datastore is more in the 10-20 range (and your mileage may vary - this is just my experience), depending on the SAN, the workload, the size of the VMDKs, and the concurrent I/O for each of them.

Hope this helps!

David Klee | Founder and Chief Architect | Heraflux Technologies | dklee@heraflux.com

weinstein5 · ‎04-03-2012

1). What are the major issues (current or in futue) do you see for "One Lun, one vmdk" policy?

Depending on the size of your environment you can run into the LUN maximum per host of 256 -

2). What are advantage of this configuration, if any? (apart from isolating "datastore corruption" to one vmdk)

None really

3)Is there any disk I/O performance gain/loss for following this configuration?

Not really other than you have eliminated the possibility of saturating the path to the LUN

4)What is the industrywide accepted "sweet spot" in terms of "number of vmdks/datastore" ? Also, the accepted size of standard datastore for VMs with normal disk I/O?

That is difficult to say since it depends on the IO to the virtual disk - my experience has been 4-6 VMDKs per LUN - sizing really depend on the size of the needs of the virtual machines - typically I have been going about 500 GB per LUN but am considering increasing that to 750 GB to 1 TB

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Josh26 · ‎04-03-2012

charron wrote:
Hi KjB,
but their current strategy was largely formulated after a particularly bad experience encountering "Datastore corruption" incident few years ago when they got caught in the open with very poor backup/restore strategy in place.

I run into this strategy everywhere for some reason, and it always relates to some legacy of "we have a problem under ESX 3.0" or "the firmware on my keyboard wouldn't support multiple vmdk's per LUN" sort of nonsense.

The best reason to use a sensible configuration is that there is no reason to ever not use one.

I cannot imagine the administrative effort involved in managing 640 data stores.

I just deployed a single 22TB LUN for an entire cluster and I don't feel I missed anything.

gravesg · ‎04-04-2012

Single 22TB LUN?

Even with VAAI and reduction of scsi reservations conflicts, there still an absolute queue for scsi devices. There has to be a balance for good design.

Josh26 · ‎04-04-2012

Giovanni.Gravesande wrote:
Single 22TB LUN?
Even with VAAI and reduction of scsi reservations conflicts, there still an absolute queue for scsi devices. There has to be a balance for good design.

Unless you have bandwidth concerns around a single path using a single LUN at once ( we don't), it's all the same array, two LUNs or three LUNs won't be appreciably different.

All

Question - Best Practices for "Number of vmdk per datastore"