I never thought I'd be involved in deploying a 28vCPU VM as I've always strived to educate our user community about over provisioning VM's. However last month we migrated our Oracle database server running on HPUX on RISC to our main production cluster running on Oracle Linux. If you've been involved in such fun you know that as the systems guy your input to the end result will get shot down by somebody higher up the chain. Long story short during the migration weekend we had the new VM up and running with 8 vCPU's to start with. It was slammed. I wanted to know what was running and if this was unusual behavior. Turned out after the fact that hundreds of batch jobs that had been waiting to run were now running due to the amount of time it took to do the migration. So yes this was unusual behavior.
Come Monday we had the Linux VM running with 28 vCPU's happy as a clam and performance exceeded expectations by a long shot going from running on a fridge sized computer to a wee VM. Our 5.5 Ent + cluster runs BL460's G8's with 2 socket 8 cores per socket with HT enabled. In essence this VM could need nearly all the 32 CPU's on the host. To play it safe I chose to disable automated DRS the cluster and dedicate a host to this VM. The idea being that this pig could starve other VM's or be impeded by other VM's running on the host.
A month later we have found via Linux performance monitoring, vcenter, vROPS that this thing is using about half or less of the vCPU's actively at any given time. I have lobbied to move the vCPU count down to a reasonable number so it will play better in the cluster. We are moving it down to 20 vCPU this weekend (not as much as I wanted but some is better than nothing!).
Anybody have an opinion on whether or not to just run this thing in automated (~50% on the slider) DRS mode and let vCenter determine resource needs? My thought is to keep the host dedicated and DRS manual but it is kind of a PITA to continue running that way with 200+ VM's in the cluster. I thought maybe setting CPU reserves on the VM.. but I've always tried to stay away from reserves.
Thanks for any opinions.
The maximum recommended CPU count of a VM running on hosts with 2x 8-core CPUs running is 16 vCPU. These articles can give you some ammo to convince those who are suggesting higher.
With regard to your question, it depends. Some environments do not overcommit the hardware resources on their Tier 1 applications. If this is not a Tier 1 application, or you do not have that sort of policy, gradually add VMs to the host until the %RDY or %Contention reaches the maximum acceptable value (per your policy). This link provides an example capacity management policy. The entire series will help you monitor based on those policies.
If it's Tier 1, keep it somewhat dedicated or create a Tier 1 cluster that does not over-commit. In any case, DRS will likely be smart enough to keep other VMs of the host on which the monster VM. You could always try it, and adjust later.
Yep this is as Tier 1 as we get thus why I have it on a dedicated host. My preference would be to have a two node HA cluster to sit this thing on with it's development siblings. However the cost of VMware licensing is far from free so it has to live in Gen Pop. We are now down to 20vCPU after last weekend. Noted drop in Ready time on the VM since the change.
Hopefully we'll get down below 16 in the near future.
Thanks for your input.