DRS problem

jrr001 · ‎09-30-2010

We have two odd ball IBM HS22 Blades Model 7870-AC1 with E5540 procs in a cluster with other Model HS21 blades. Running vSphere 4.1

These Blades have hyperthreading whereas the HS21 blades do not.

*Our main issue is vSPhere DRS appears to pick on these blades and put dangerously high amounts of RAM workloads on them 92%-95%.*

As a test I turned off the Hyperthreading to make them more like the HS21 blades...but I am not seeing this is resolving anything. Another thing I have noticed though.

These two IBM blades are HS22 7870-AC1 models. Their RAM in vCenter shows 36852.31 MB while ALL other IBM blades show a solid MB rating (32766mb, or 49150mb, or 40958mb).

I think this might be causing the issue...but not real sure how to fix that.

Anyone seen problems like this? Of course best practice would not to have variation in Models and RAM like this...but you know how life works sometimes.

Thanks...points will be awarded.

jgaddi · ‎09-30-2010

What is the BIOS version and check the HCL if its supported. http://www.vmware.com/resources/compatibility/search.php

jrr001 · ‎10-04-2010

BIOS version exceeds the HCL needed. I did go ahead and update the IMM and UEFI bios though to current levels. Not much improvement.

jgaddi · ‎10-07-2010

when you go to esxtop then "m" for memory, what value do you see in pmem? do you see a round up number or with decimals?

jrr001 · ‎10-07-2010

PMEM /MB: 36852 total: 800 cos, 846 vmk, 28493 other, 6712 free

I manually balanced the cluster then turned on auto DRS but dialed back to level 2 (conservative). It still puts this server into the RED.

If I do a maintenance mode on another server in the cluster it gets killed with load on RAM.

jgaddi · ‎10-07-2010

values in esxtop shows host sees a diff. value. try this

if you login directly to the ESX via vi client do you see the same decimal memory value?

1. disconnect the host from VC (do not remove).

2. follow http://kb.vmware.com/kb/1003490 to restart mgmt agents

3. reconnect and see what is displayed in memory under summary.

if that doesn't help, can you check if disabling DRS or moving the host out of cluster make any difference.

All

DRS problem