VMware Cloud Community
fletch00
Enthusiast
Enthusiast

Dell R900 memory errors

We are seeing a lot of these - anyone else?

omreport chassis memory index=25

Memory Device Information

Health : Non-Critical

Memory Device on Connector: DIMM B7

Attributes : Status

Values : Non-Critical

Attributes : Device Name

Values : DIMM B7

Attributes : Size

Values : 4096 MB

Attributes : Type

Values : DDR2 FB-DIMM-SYNCHRONOUS

Attributes : Speed

Values : 1.50 ns

Attributes : Failures

Values : Single-bit warning error rate exceeded

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
10 Replies
mcowger
Immortal
Immortal

You have a failing DIMM - totally normal...call dell and get it replaced.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
fletch00
Enthusiast
Enthusiast

Yes we have opened cases with Dell

We've had 5 of these in two weeks

Seems like too many...

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
Lightbulb
Virtuoso
Virtuoso

You may have caught a bad lot. I have seen this kind of issue where issues will "cluster" around a set of systems deployed from a lot that was purchased at the same time. Dell should be able to square you away.

0 Kudos
mcowger
Immortal
Immortal

While we dont use R900s for virtulization, we do use them for other stuff, and have also seen much higher than average failure rates.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
fletch00
Enthusiast
Enthusiast

We have logged 24 memory events on 4 R900’s since in the last 3 months - we escalated to a Dell technical acct mgr and were told today this issue has to do with the memory brand itself which is “Hynix”.

Stay tuned!

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
sr_eu_lic_mgmt
Contributor
Contributor

Just had an MPmerory error in one of our brand new PE R900 server.

MPmemory error: Southbridge CRC Error (XMATS32 FAIL).

We are also using 4 GB Hynix memory modules (HYMP151F72CP4N3 - Y5)

Escaleted to Dell technical account manager

0 Kudos
malaysiavm
Expert
Expert

Maybe these are the problems only apply to certain batch of R900. I have 3 of my R900 running for a year now and never experienced the issues reported here.

Craig

vExpert 2009

Malaysia VMware Communities -

Craig vExpert 2009 & 2010 Netapp NCIE, NCDA 8.0.1 Malaysia VMware Communities - http://www.malaysiavm.com
0 Kudos
fletch00
Enthusiast
Enthusiast

According to Dell the plan is to replace any of the memory in the servers that falls within a certain manufacturing date window and on some of the servers it involves a complete memory swap.

dmidecode will give you the spec on the memory:

Handle 0x1101

DMI type 17, 28 bytes.

Memory Device

Array Handle: 0x1000

Error Information Handle: Not Provided

Total Width: 72 bits

Data Width: 64 bits

Size: 4096 MB

Form Factor: <OUT OF SPEC>

Set: 1

Locator: DIMM B1

Bank Locator: Not Specified

Type: <OUT OF SPEC>

Type Detail: Synchronous

Speed: 667 MHz (1.5 ns)

Manufacturer: 80AD808980AD

Serial Number: 49631004

Asset Tag: 010815

Part Number: HYMP151F72CP4N3-Y5

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
fletch00
Enthusiast
Enthusiast

FYI,

the dell rep is recommending we replace 237 four Gigabyte (Hynix) Dimms from the dmidecode analysis -

I think we'll be getting that platinum uplift for free now - this is a big problem - even with vmotion zero downtime it will take days.

Fletch - VCP

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
malaysiavm
Expert
Expert

you need a 10gb connection for your Vmotion, that will really help :smileygrin:

Craig

vExpert 2009

Malaysia VMware Communities -

Craig vExpert 2009 & 2010 Netapp NCIE, NCDA 8.0.1 Malaysia VMware Communities - http://www.malaysiavm.com
0 Kudos