Re: DRS scenario help required.

anthonypoh · ‎05-19-2011

My company has just completed a virtualisation exercise, but at the moment I feel that the company is not using virtualisation to it's full potential....

When virtualisation was first mooted to the board, it was approved based on a certain condition:

"Every VM will be mapped 1:1 with a physical CPU core"

This meant that for a dual CPU server with 6 cores per CPU, we could only support 12 VMs as there are only 12 cores.....

At present this 1:1 mapping will stay in place as the virtualisation piece is assessed over the next 6-8 months...... a total waste of resources in my opinion....

Anyways, because of this issue of 1:1 mapping I have had to either disable DRS or set it to the lowest level in order to stop the automatic re-balancing of the ESX cluster.

Correct me if I'm wrong but DRS works by considering a hosts' resource utilisation of cpu, memory, etc......

Is there a way of configuring DRS to work based on the condition that it can only move a VM if there is a 'core' available on the target ESX host?

(From what I understand, this level of granularity isn't possible)

At the moment if a host in my cluster fails, then we usually manually power the VMs up on hosts that have spare 'cores' available as DRS doesn't seem to do this!

This is obviously a waste of man-power, and ideally I would like to have a solution where I can enable DRS but it will keep to the 1:1 mapping rule when moving VMs about.....

Anyways, I was having a good think the other day and came up with a scenario which I hope may work, and was wondering if I could run it past some people who may know more about VMware than me.....

Here goes:

Each VM can have reservations, shares and limits set to CPU and memory resources.....
Each ESX host obviously has a finite amount of resources, for example my server with 2x hex-core 2.66GHz processors will possibly have 32GHz of processing resource. (This is me guessing, is this roughly how VMware calculates CPU resource per host?)
The board wishes for a 1:1 mapping – 1 vCPU per core – which means my host will have 12x vCPUs. So if we divide the total CPU resource of the host by the number of vCPUs required, we get roughly 2.66GHz per vCPU.
Next I will set each VM to have a CPU reservation and limit of 2.66GHz - this should mean that the VM will only power on if that ESX host meets the VMs reserved requirements (in this case 2.66GHz) and will never request more than the limit (2.66GHz). In a way, binding the VM to 1 core…. (I'm assuming this is how reservations and limits work..... again, please correct me if I'm wrong!)
Now I believe VMware and DRS works by only moving VMs onto ESX hosts that have free resources available for the VM to use. If a VM has a reserved requirement, then VMware looks for an ESX host that can meet that reserved amount. So if the VM has been set with a CPU reservation of 2.66GHz, then DRS will only move that VM onto an ESX host which has 2.66GHz of CPU resource available. Correct?

In theory will this stop DRS over-subscribing CPU resources on an ESX host…….. because once 12x VMs are already running on a host, there will be zero CPU resource available... and hence keep the 1:1 mapping rule…….?!?

Or is this something that's better done using the CPU reservations in Resource Pools?

If anyone else has a solution that could work, I would be very grateful for suggestions and ideas...... 😃

Many thanks,

Anthony

http://thevirtualunknown.co.uk/

bilalhashmi · ‎05-19-2011

What you are doing is very interesting and in my opinion a waste of time and resources but you already agree with that so lets not got there..

I noticed you said you manually power on VMs on other host if a host goes down? Why is HA not being leveraged here instead?

If you create reservation for the vCPUs and Mem for every VM, then you shouldn't have issues with DRS. Keep in mind DRS will only move VMs based on the agressivness that has been set. You can also set it to manual (still bad idea, but this is all a bad idea to begin with), in which case DRS will not autmatically move VMs around... Let's say DRS is at the lowest setting. DRS will only come in to play when a host is in contention and moving the VM will guarntee performance gain. In your case this will not happen as the VM will be happy were it is as it has the resoruces reserved. Also if moving it will make it better, then with the manual/semi automatic setting, DRS will not move your VMs around itself. I am not sure if DRS checks for reservations prior to moving a VM.

Now lets come to HA, if you have reservations set, HA will not restart the VM on a diff host in case of a failure unless, the host can gaurentee the reserved resources for that particular VM. So you wont be able to overcommit and meet the goal you have in mind. Best part is that you wont have to power on the VM manually, HA will do it for u. Just keep in mind you really have a cluster where you can afford to loose a host and still maintain a 1:1 relationship (again kinda silly but oh well)... so lets say u need you have 8 hosts in ur cluster to run ur VMs and maintain the 1:1 relationship, make sure u have at least 9 hosts in that cluster to in case of a failure so that ur 1:1 relationship is maintained... makes sense?

Lastly, RPs are spread acorss multiple hosts in a cluster. So if you set limits/res there instead, you could end up not having the 1:1 relationship as now these reservations are not tied to the VMs, but to the RPs they reside in..

Sorry what I have told u above is probably against all best practices... but keeping in mind what you are trying to do here... i think this will work... the good thing is that u realize how silly this really is.. good luck

Follow me @ Cloud-Buddy.com

Blog: www.Cloud-Buddy.com | Follow me @hashmibilal

anthonypoh · ‎05-19-2011

thanks..... and i do agree it's a waste of resources/effort/money/life..... but my hands are tied by the non-technical people higher up! :smileycry:

Sorry, forgot to mention that HA is also enabled...... but thanks for your thoughts on HA and how it deals with VM reservations!

It was more of the case of where I'm trying to restrict VMware moving/powering on VMs on ESX hosts that may break this 1:1 mapping.....

This problem and thought process has stemmed from trying to implement Update Manager.... From what I understand VUM will put ESX hosts into maintenance mode when remediating hosts, and will use vMotion to move VMs off from the host.... but that's another story.... 😃

Many thanks for your insight....

http://thevirtualunknown.co.uk/

bilalhashmi · ‎05-19-2011

Hope it all works out.. good luck..

Follow me @ Cloud-Buddy.com

Blog: www.Cloud-Buddy.com | Follow me @hashmibilal

DSTAVERT · ‎05-19-2011

If you have a mandate to preserve a 1 to 1 I would present those making that decision that it is imperative that there is an idle host specifically for HA and maintenance.

-- David -- VMware Communities Moderator

anthonypoh · ‎05-20-2011

thanks, yup we have several idle hosts doing nothing but chewing up power...... 😃

what i'm after is a way of automating the movement of VMs around the HA cluster but still ensure that it doesn't break the 1:1 mapping rule.....

http://thevirtualunknown.co.uk/

bilalhashmi · ‎05-20-2011

Try this. Again not best practice but in ur case it will get u where u want to be. lets say u have a 4 host cluster and each host has 24GB mem and a single quad core proc... lets assume you really need 3 hosts but you have one as an extra for ur n-1 config.. You only have a total of 12 VMs each with 1vCPU

Host 1 = 4 VMs (5GB reserved for each VM)

Host 2 = 4 VMs (5GB reserved for each VM)

Host 3 = 4 VMs (5GB reserved for each VM)

Host 4 = Nothing - spare host in the cluster for ur n-1 config

Now lets say host 1 dies, you have 4 VMs that have a 5GB reservation each, where do you think they will start? They have no option but to start on host 4 and you will still be able to have the 1:1 relationship that you want... Host 2 and 3 already have memory dedicated to the 4 VMs that are running and only have about 4 free which means it wont meet the requirements of the VMs that are looking for a host.

Lets say your load was distributed differently:

Host 1 = 2 VMs (5GB reserved for each VM)

Host 2 = 4 VMs (5GB reserved for each VM)

Host 3 = 3 VMs (5GB reserved for each VM)

Host 4 = 3 VMs (5GB reserved for each VM)

Lets say Host 1 dies again, where will the 2 VMs that were running will start now? 1 on host3 and 1 on host4 as that is the only way the reservation of the VMs can be met..

You will just have to play around with the numbers and come up with ur own reservations but I think this stratgey should work for you to keep HA and still maintain your 1:1 relationship..

I can't believe I spent so much time thinking up such a bad idea... lol.. hopefully the mgmt will soon set u free from the 1:1 relationship good luck and hope this helps.

Follow me @ Cloud-Buddy.com

Blog: www.Cloud-Buddy.com | Follow me @hashmibilal

SyApps · ‎05-22-2011

Wow, these are incredible limitations that are not only pointless but defeat the value or HA, DRS, and shared resource usage. I hope there is an end to all of your red tape issues somewhere. I applaud your effort to find a way to appease whoever is putting such policies in place. I wouldn't be so easy to tame.

This is honestly easy enough. Tally up the amount of physical CPUs you have and change the reservation for each VM that you bring up. DRS won't break the reservations rules; neither will HA. If a host goes down and a functional host already has a full load of VMs with no room for more VMs that require 1CPU reserveration then it won't move it there. It will just sit there, down, in an off state, even though it could be up if management allowed it to come up. Rediculous... I skimmed a few of the comments that were already posted here, but I think someone already mentioned that you could always just put DRS into a manual mode. That way you can do all the work and it won't distribute for itself.

Virtual Servers with such heavy physical restrictions aren't really virtual servers at that point.

Good Luck

Always a big thanks to the community in advance! Dan Lee

anthonypoh · ‎05-23-2011

thanks for your help guys....... i'm probably going to have to whip together a PoC to test the reservation/limits......

http://thevirtualunknown.co.uk/

All

DRS scenario help required.