VMware Cloud Community
Evgenus
Contributor
Contributor

ESXi PSOD and random freezes

Hi folks,

I am new with ESXi. And my english is far from good but here is the problem.

I have 3 hosts running ESXi and which i thought has best hardware, consistently freezes from time to time. One once even gone to PSOD.

His hardware are supermicro SYS-6018R-WTR (BIOS version 2.0b) 2 x Xeon E5-2620v4, 64Gb ddr4 ram, LSI 9260-4i RAID with battery, 4 server toshiba disks.

Look at the picture below with error message:

esxi_error.png

Is it something related to processor?

Thank god we are on holidays until 9th of January but please can somebody enlighten me what can be done to make this host stable?

Thank you!

Reply
0 Kudos
31 Replies
Evgenus
Contributor
Contributor

No it's not HCL.

Gonna try PssMark (commercial) version of memtest too.

Reply
0 Kudos
Evgenus
Contributor
Contributor

But PSOD is not the main issue. So is was only one time PSOD appeared.

Most of the time server just freezes and becomes unresponsive.

Reply
0 Kudos
ganeshgv
Enthusiast
Enthusiast

HI Evgenus,

I got the same PSOD error one of my Esxi host. Its completely error with Physical processor(CPU).

Just send the Esxi kernel log to VMware and they analyse and confirm which CPU is the problematic one and same has share to hardware vendor.

They will resolve the issue.

- Ganesh GV

Reply
0 Kudos
SmokinJoe59
Enthusiast
Enthusiast

Bummer, I assumed that they were done with firmware on those cards and just making lots of changes on the newer faster 32 gig ones.

Reply
0 Kudos
nadupalliramesh
Contributor
Contributor

I am not sure how much helpful this would be, but VMware has recently released a patch. Build number is 7388607. According to the release notes, they have fixed some of the bugs that could trigger PSOD. We did experience PSOD due to network, after applied the patches esxi host seems to be stable.

Reply
0 Kudos
Evgenus
Contributor
Contributor

Hi ganeshgv,

Thanks pretty sad news.

Reply
0 Kudos
Evgenus
Contributor
Contributor

memtest2.png

so it's definatly not RAm problem.

All memtests showed that memry is fine.

Reply
0 Kudos
Evgenus
Contributor
Contributor

Answer of supermicro support:

Hello,

Most of the time is this a bad connection between cpu and cpu socket. The cpu has the memory controller embedded. The weak link between the memory modules and the memory controller is the cpu socket. We did not test Kingston memory, therefore issues with this type of memory are unknown to us. But if the same modules work normal in a second system, it does not seem to be a memory module issue.

Reply
0 Kudos
Evgenus
Contributor
Contributor

I tried. Everything the same.

Reply
0 Kudos
Evgenus
Contributor
Contributor

This looks like most realistic.

Reply
0 Kudos
Evgenus
Contributor
Contributor

Good morning folks.

Well problem was in processors.

I tried all possible setups. Different memory, Different configs, 1 CPU/2 CPU. But once i've changed xeon e5-v4 processors to xeon e5-v3 everything started to work stable.

But Supermicro declare their support for v4 processors for x10 platform with correct BIOS. Seems like it's not very true.

Reply
0 Kudos
Urs_ak
Contributor
Contributor

Same problem.

2x E5-2603 v4

Reply
0 Kudos