We are seeing a lot of these - anyone else?
omreport chassis memory index=25
Memory Device Information
Health : Non-Critical
Memory Device on Connector: DIMM B7
Attributes : Status
Values : Non-Critical
Attributes : Device Name
Values : DIMM B7
Attributes : Size
Values : 4096 MB
Attributes : Type
Values : DDR2 FB-DIMM-SYNCHRONOUS
Attributes : Speed
Values : 1.50 ns
Attributes : Failures
Values : Single-bit warning error rate exceeded
You have a failing DIMM - totally normal...call dell and get it replaced.
--Matt
VCP, vExpert, Unix Geek
Yes we have opened cases with Dell
We've had 5 of these in two weeks
Seems like too many...
You may have caught a bad lot. I have seen this kind of issue where issues will "cluster" around a set of systems deployed from a lot that was purchased at the same time. Dell should be able to square you away.
While we dont use R900s for virtulization, we do use them for other stuff, and have also seen much higher than average failure rates.
--Matt
VCP, vExpert, Unix Geek
We have logged 24 memory events on 4 R900’s since in the last 3 months - we escalated to a Dell technical acct mgr and were told today this issue has to do with the memory brand itself which is “Hynix”.
Stay tuned!
Just had an MPmerory error in one of our brand new PE R900 server.
MPmemory error: Southbridge CRC Error (XMATS32 FAIL).
We are also using 4 GB Hynix memory modules (HYMP151F72CP4N3 - Y5)
Escaleted to Dell technical account manager
Maybe these are the problems only apply to certain batch of R900. I have 3 of my R900 running for a year now and never experienced the issues reported here.
Craig
vExpert 2009
According to Dell the plan is to replace any of the memory in the servers that falls within a certain manufacturing date window and on some of the servers it involves a complete memory swap.
dmidecode will give you the spec on the memory:
Handle 0x1101
DMI type 17, 28 bytes.
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: <OUT OF SPEC>
Set: 1
Locator: DIMM B1
Bank Locator: Not Specified
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 667 MHz (1.5 ns)
Manufacturer: 80AD808980AD
Serial Number: 49631004
Asset Tag: 010815
Part Number: HYMP151F72CP4N3-Y5
FYI,
the dell rep is recommending we replace 237 four Gigabyte (Hynix) Dimms from the dmidecode analysis -
I think we'll be getting that platinum uplift for free now - this is a big problem - even with vmotion zero downtime it will take days.
Fletch - VCP