Cluster with mix of large and small servers - scal...

tickermcse76 · ‎06-25-2015

Have an environment with a very large mix of lightweight 1-2 CPU 3-8GB RAM servers, and many database servers that might run anywhere from 2-12 CPU and 8-96GB RAM. Have noticed from time to time the large database servers can have extremely large variance in time to complete their scheduled overnight jobs; whereas before when they were physical or on very isolated VM hosts, the completion times were always in a very consistent and tight window. Also we've had some large server DRS vMotions not complete successfully and the guest VM needs a hard boot (perhaps the host did not have the resources to receive it?). Never see this happen with any lightweight guests.

Was wondering if anyone has supported similar environments and what type of design works; having less VM hosts scaled up, or having more lower resource VM hosts? Or perhaps a mix of both and using affinity to keep the high resource DB servers as isolated as possible? The later option is a challenge because there are so many DB servers and not just a few outliers.

Alistar · ‎06-25-2015

Hi there,

general rule of thumb is to have your ESXi host to have double the resources of your largest VM. This means your ESXi hosts should be scaled ideally to 24 vCPUs (2 x 12 cores or 4x 6 Cores - but there is NUMA disparity, performance implications, etc.) and 192 GB RAM. Of course depending on the budget and another factors that be, this might not always be the case. But hosts with at least 1,5 the power of the largest server should still be supplied to prevent any bottleneck.

You can make an anti-affinity DRS rule for the larger VMs to be separated from each other (Keep Virtual Machines apart rule) so that two of them don't choke the whole ESXi host and also the smaller VMs running alongside them. If the DRS is failing, I suggest beefing up your ESXi hosts with dedicated NICs for vMotion. Disproportionate ESXi hosts wouldn't help in the cluster. Maybe an option would be for you to create one cluster for "Monster VMs" (powerhouses) and one for "Standard DBs" (less powerful ESXi hosts)

As for scaling up versus scaling out, keep in mind that these are database servers, so it depends on what you need to avoid when the ESXi host fails - having many VMs down at once or just a few with a quick fallback?

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/

All

Cluster with mix of large and small servers - scale hosts up or scale out?