CPU performance difference between HP BL460c Blade...

colin_graham · ‎10-27-2009

After a recent bout of firmware updates we started to experience some frusting performance issues when we vMotion a VM between a couple of HP BL460c Blade Servers running ESXi 3 (supposedly identical servers - same hardware & software configuration). Our issue is that a VM when running on ESX01 (good server) will run at say 10% CPU and when vMotioned to ESX02 this will increase to 20% CPU utilisation. If we then vMotion it back to ESX01 the CPU usage will drop back to 10%. Effectively ESX01 is carrying about 17 VMs while ESX can only carry about 9. Even though ESX01 is carrying more load and working harder VMs will perform better on this Server

The History goes something like this - we started with ESXi updates from Update 1 to the latest build (both servers, VCentre was also updated) and then while applying the firmware to one of the Blade Servers (ESX02) the System board got toasted and wouldn't respond. HP duly replaced the system board, we re-applied the firmware updates, updated BIOS settings to enable VT and carried on.

Virtual center version 2.5.0 Build 174768, ESXi version 3.5.0 Build 184236 (on both Servers)

HP BL460c G1 Blade Servers (dual QUAD core E5450 CPUs) are now both running rompaq 05/12/2009, iLO 1.79 and the OA in the enclosure is at 2.60. There is a later rompaq release but we wanted both servers to be identical. Firmware on all the components (on board NIC, Qlogic cards etc) is at the same level on both servers. C3000 Blade Enclosure connecting to an HP EVA4400 SAN.

We have tried replacing the system board again, formatting the drives and re-installing a fresh copy of ESXi and then patching to the same level again. We have tried moving the Blade to a different slot in the enclosure (change of fibre switch connections). Same results.

Any ideas where to go from here? Anyone else had a similar problem?

Thanks, Colin

DSTAVERT · ‎10-27-2009

All settings in the BIOS identical? I get lost in there sometimes. Any possibility that there are different CPU steppings? The VM has single or multiple processors? Processor reservations on bad server (I assume that's what you call it ) that cause a pair of vCPUs to be split across two pCPU?

-- David -- VMware Communities Moderator

colin_graham · ‎10-27-2009

Our documentation shows that only the VT setting was enabled in the BIOS when these were first set up. I can't take the ESX01 Server down as some of the VMs are 24/7 uptime and ESX02 can't support the full load (ESX01 can - so much for HA Redundancy ). We are trying to schedule a full outage to confirm BIOS settings. ESX02 had the BIOS set to factory defaults after replacing the system board and then VT enabled.

Both Blade Servers and all 4 CPUs were purchased and built at the same time. How do I check that the CPU steppings are set the same? Is this something I can find from the iLO or the OA on the Enclosure? Is it shown somewhere in the VI Console?

Problem is eveident for any VM we vMotion on to the "bad" host. Can be a single or multiprocessor VM. We have changed the VMs that had 4 vCPUs down to having 2 vCPUs (they perform better now but still double processor use when on the bad host). No Processor reservations on the bad host.

DSTAVERT · ‎10-27-2009

If the CPU's came with the package from a reputable source you shouldn't have a problem. I just asked to spark questions on your end.

Any difference in the Host configuration re EVC masks?

-- David -- VMware Communities Moderator

colin_graham · ‎10-27-2009

Any questions that prompt us to think of something else are appreciated. The CPUs all came from HP. I've passed a couple of questions on to the HP tech investigating this issue.

We do not have EVC enabled on the cluster and have the BIOS setting (I think it is Execute Disable from memory) set to disabled so the Blades will not show as EVC compatible. This is the same on Both hosts. - no evc masks configured.

colin_graham · ‎04-28-2010

For those that are interested, this has been resolved. After various software rebuilds (as advised by VMware), supplying log and performance data, upgrading to vSphere 4, hardware replacement (new CPUs, System board, mezanine cards for NIC & FC) the fault was traced to RAM. Although all logs indicated no problem with RAM, once it was replaced the issue was resolved.

All

CPU performance difference between HP BL460c Blade Servers