VMware Cloud Community
metrogeekythink
Contributor
Contributor

PSOD CPU issue or LSI Raid Issue

I been wrecking my brain and trying to find what caused my ESXI machine to have PSOD. Initially there was a faulty CPU, so I removed it, so my server currently is running on one processer. It still works as I have inserted the ram and pcie slot which the CPU1 support. PSOD still happens when I was trying to transfer files out from my Windows Server VM to another network disk. It does not happen immediately, it happen like after 30min or so and I am transferring around 32gig of files. Anyone could help me decipher the log, i am running on the latest lsi_mr3 driver btw.

0 Kudos
12 Replies
metrogeekythink
Contributor
Contributor

Another note to add, I am running on Raid 6 SSD on datastore1 and a 1TB HDD on datastore2.

0 Kudos
daphnissov
Immortal
Immortal

Your log shows MCEs which mean hardware faults. You're also running on the GA release of ESXi 6.7 and need to get to Update 1. I'm assuming this is also unsupported hardware, so what is it?

0 Kudos
metrogeekythink
Contributor
Contributor

I cannot download 6.7u1 if I am a free user right? Or should I use 6.5u2 instead?

0 Kudos
daphnissov
Immortal
Immortal

Depends. What is this hardware?

0 Kudos
metrogeekythink
Contributor
Contributor

Asus Z10PE-D16 WS (already updated to latest bios and firmware)

2 x Xeon E5-2620 V4 (1 CPU removed as it is faulty) (Working CPU already tested with IPDT)

4 x 8G 2400 Kingston RDIMM with ECC (but showing as 2133) (Only 1 X ram running and that ram already test with memtest)

6 x Micron 1TB SSD

1 X WD Blue 1TB HDD

1 X NVDIA GTX750

LSI 9271-8i

0 Kudos
daphnissov
Immortal
Immortal

Ok, so fully unsupported whitebox gear here.

0 Kudos
metrogeekythink
Contributor
Contributor

Seems like when I have my entire vm file on a sata HDD, there will be no issue. So I am pretty is the lsi_mr3 is causing the problem, anyway I notice there is a newer version for the driver, will try it and feedback again.

0 Kudos
metrogeekythink
Contributor
Contributor

Sorry I am wrong, seems like it still happen. But the logs before the error seems abit different.

0 Kudos
metrogeekythink
Contributor
Contributor

Additional information, no PSOD occur when I transfer the VM from my Raid to HDD which is over 900 gig. That was transfer from datastore1 to datastore2 without any VM activated.

0 Kudos
metrogeekythink
Contributor
Contributor

Did a fresh install of 6.7 and updated everything to 6.7u1, seems like still having issue. Gonna try 6.5u2 later. Attached is the log.

0 Kudos
metrogeekythink
Contributor
Contributor

Updated 6.5u2 also having the same issue.

ALERT: MCA: 201: UC Excp G5 B1 Sbb80000000000174 A4180138fe6e3 M86 Cache Hierarchy: Level 0 Data Cache Eviction Error.

Does this meant the CPU is faulty or just incompatible?

0 Kudos
metrogeekythink
Contributor
Contributor

Seems like it could be PSU related, I change to a new one and so far no PSOD, will update again.

0 Kudos