I was recently copied on an internal thread discussing a performance tweak for VMware vSphere.  The thread discussed gains that can be derived from an adjustment to the CPU scheduler.  In ESX 3.5, ESX's cell construct limited vCPU mobility between different sockets.  ESX 4.0 has no such limitations and its aggressive migrations are non-optimal in some cases.


This thread details the application of this change in ESX 4 and provides some insight into its impact.  This scheduler modification is going to be baked in to the first update to ESX 4.



On 4socket (or more) Dunnington (or any non-NUMA) platform, VMmark score can be further improved by enabling CoschedHandoffLLC:  In console OS, it can be enabled via vsish (available from VMwaredebug-tools.rpm):


vsish -e set /config/Cpu/intOpts/CoschedHandoffLLC 1
I believe that config parameter is also tunable through VC or VI client. (haven't confirmed myself)


The degree of improvement depends on the configurations but in one case, the improvement was about 10 - 20%.


In default setting, VMmark may suffer many inter-package vcpu migrations which causes performance degradation. Setting CoschedHandoffLLC reduces the number of inter-package vcpu migrations and recovers performance loss.


The fix is disabled by default in ESX 4.0 GA but will be enabled by default in ESX 4.0 u1.



Try this out and let me know if you see a significant change on any of your workloads.