VMware Cloud Community
pops106
Enthusiast
Enthusiast

VMWare on a DL585 Purple Screening

Kelp, I'm drowning in seaweed.

I have 8 * HP DL585s, all quad CPU, 16Gb RAM and QL2342 HBAs and all are running ESX 3.5 off a NetAPP SAN.

One of them purple screens if it gets even remotely busy claiming RAM failures. Mostly DIMMs 3 and 4 in CPU3.

The thing is this only happens in VMWare. I had MemTest running for a couple of days without failure. I installed Server 2003 64bit edition and left Sandra doing a burn in test for a couple of days and it too failed to fail. So like I said it only happens in VMWare.

HP have swapped out the RAM for me twice and it's not made a difference. I tried swapping CPU3 and CPU2 boards (CPUs and all) around and now I have DIMM3 on CPU2 and DIMMs 3 and 4 on CPU4 failing. Why CPU4 now?

To make matters worse my predecessor in an attempt to narrow the problem down swapped bits out with one of the other 585s and now it too has a strange problem. It spontaneously resets itself every now and then.

So I'm stuck on what to try next. Are there any logs I should be looking for? Has anyone else experienced anything this weird before?

Ta in advance

Mark

0 Kudos
7 Replies
dominic7
Virtuoso
Virtuoso

I had a pile of DL585 G1's that failed in the exact same manner. The only way to keep the errors from re-occuring was to have HP replace all the RAM in the hosts. They (HP) don't seem to be able to reliably diagnose which DIMM/DIMMs are causing the problems. You may have better luck since you only have 16GiB of RAM, but they don't make parts for the 585G1 anymore and are very reluctant to just send you a new pile of RAM. Then again, I may have single handedly depleted them. I think I still have a coupld hundred GiB of 2GiB DIMMs laying around here... lol.

0 Kudos
azn2kew
Champion
Champion

Seriously speaking, HP DL series of servers have the tendency to have Purple Screen of Death. Look at these collections POSD issues and majority of them are HP DL series.

They might help with your problems!

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA
0 Kudos
jhanekom
Virtuoso
Virtuoso

Earlier Firmware revisions for the DL585 had a problem with either misdiagnosing failed memory or reporting them incorrectly. Make sure you're on the latest firmware release. Won't make the problem go away in itself, but will give you more accurate information to deal with.

Also, what are you using to tell you where the failed memory is? The PSOD info, or the HP Integrated Management Log?

0 Kudos
pops106
Enthusiast
Enthusiast

The LEDs on top of the server indicate memory failures.

And I'm pretty sure the firmware is up to date. This was one of the first things I tried when I took over late last year.

It's a bit annoying when out of 8 servers this is the only doing it. It has never actually been used successfully for any extended period of time because of this and we've had it ages.

0 Kudos
Luis_F
Enthusiast
Enthusiast

Motherboard failure?

0 Kudos
pdrace
Hot Shot
Hot Shot

I have 8 * HP DL585s, all quad CPU, 16Gb RAM and QL2342 HBAs and all are running ESX 3.5 off a NetAPP SAN.

What model are these? I assume you mean quad socket not quad core.

0 Kudos
Dave_Mishchenko
Immortal
Immortal

Your post has been moved to the VI: ESX 3.5 forum

Dave Mishchenko

VMware Communities User Moderator

0 Kudos