VMware Cloud Community
friedchicken
Contributor
Contributor

spurious hardware memory 'errors' on hp dl380 g5's being generated

Hi All,

i've got 2 new HP dl380g5 servers running HP's esxi4 patched to 219382. they have both been patched up to the latest hp firmware levels (firmware cd 8.7)

both are running 32gb (4 x 8gb sticks)

both servers are showing the same symptoms - the memory lights come on for two (random) dimms on the front of the server and the health led turns red. sometimes the server stays up and running with no problems. othertimes the server locks dead and only a power reset can bring it back.

there is nothing showing in the ilo logs and nothing within the vmware hardware monitoring. the memory has been replaced

i've got other customers that have been running dl380g5's with out any issues on esx3.5 - this is our first vsphere deployment with them.

these are meant to go live soon so any advice would be great.

thanks in advance.

0 Kudos
3 Replies
friedchicken
Contributor
Contributor

Hi,

yes they are new - basically the customer already has a number of these systems already that are all the same spec which we are going to create the new cluster. again the memory is new from HP

the first thing i did when i got the servers was use the firmware bundle cd to update the firmware so yes its on the latest bios and the ilo is on 1.81

i have logged the call with hp - they say the ilo version is ok at 1.81 (i know on the hp download page for vmware it links to 1.78). they have suggested that: 1) take the memory down to 16gb and try

2) leave the smartstart diags running i a loop (which i have started tonight)

3) flip a switch on the motherboard to clear nvram

i'll let you know how i get on!

thanks

0 Kudos
Exwork
Enthusiast
Enthusiast

If the Insight Diagnostics doesn't turn anything up, try Memtest86.

You can boot the ISO via ILO, and test with that.

0 Kudos
friedchicken
Contributor
Contributor

the insight diagnostics loops have also failed - i thought hp branded memory would be good quality thats why i was suspicious that 2 servers would fail.

so i got hold of my remaining 24 sticks of 8gb which i was going to be upgrading other servers with - so far my testing has found 5 faulty!! these sticks are failing in both servers thus ruling out a dodgy motherboard in my test rig.

perhaps this is just a failing with the bigger 8gb sticks - i've only been using 4gb sticks up until now in all my installs and never ran into a problem.

i guess it pays to soak test everything before going live....

0 Kudos