Reply to Message

View discussion in a popup

Replying to:
e_espinel
Virtuoso
Virtuoso

Hello.
According to the hardware log sent by you, it could be a problem with memory DIMMs, CPU and/or mainboard.
There are many such events in the log
read 1 correctable ECC errors on CPU1 DIMM B1
Processor P_CATERR #0x50
It is recommended in this case:
1. Perform a deep internal physical cleaning of the blade server (memory slot connectors, CPU heatsink, change the CPU thermal paste), this should be done by experienced technical personnel.
2. Update all Blade server internal firmware (Bios, Board controller, CIMC, SAS, VIC), clean the internal hardware log.
Anyway these two activities are healthy for your blade server.

Run the blade server and monitor the hardware log for several days, if the events related to DIMM B1 occur again, it should be changed.

You could also directly replace the DIMM B1 with a DIMM from another less important blade server as a test.

 

 

Enrique Espinel
Senior Technical Support IBM, Lenovo, Veeam Backup and VMware vSphere.
VSP-SV, VTSP-SV, VTSP-HCI, VTSP
Please mark my comment as Correct Answer or assign Kudos if my answer was helpful to you, Thank you.
Пожалуйста, отметьте мой комментарий как Правильный ответ или поставьте Кудо, если мой ответ был вам полезен, Спасибо.
Reply
0 Kudos