VMware Cloud Community
av1123
Contributor
Contributor
Jump to solution

ESXi 5.1 PSOD - need help to identify the problem.

Good day!

I have got a repetitive issue on one of my esxi 5.1 hosts causing it to fall into PSOD (in attachment). The stacktrace is always the same from one PSOD to another. Had never faced it before and on other hosts everything works fine.

I will be glad to any help you can provide to identify if it is a hardware of software issue.

Server model is supermicro SYS-6018R-MTR with m/b X10DRL-i with double Xeon 2630 v4 and 4x Kingston KVR21R15D4/32 on board. Memory was tested for about 3 days using memtest86 with no errors.

As far, as i could notice, this PSOD doesn't depends on VMs activity - it can occur in a high load, or at midnight.

I have met this KB looking for a solution: ESXi host fails with PSOD when using Intel Xeon Processor E5 v4, E7 v4, and D-1500 families (2146388... but it is about ESXi v5.5 and 6.0. Anybody met this problem using ESXi 5.1 with E5 v4 Xeons and resolved it this way?

Thanks in advance, Alex.

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Alex,

Strange, not sure why I didn't assume by some of the other components in use but those servers SYS-6018R-MTR are modern enough (the ancient host build threw me off!), I can only find the model using v4 processors certified for use with ESXi 6.0 unless the name/number is not what matches in the HCL DB:

https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=server&productid=40944&vcl=...

Any particular reason you are still running 5.1? I understand if there are some other applications that tie in with it that are not compatible with later versions of ESXi/vCenter, but otherwise I strongly recommend upgrading to 6.0 or even just later builds of 5.5 which are very solid.

It looks like this BIOS version (2.0a) resolves v4 PSODs:

(In German but the backtraces are one of the typical v4 backtraces)

https://www.thomas-krenn.com/de/wiki/VMware_PSOD_auf_Systemen_mit_Intel_Xeon_E5-2600_v4_CPUs_und_ESX...

I will aim to have a look into it further for possible verification if v4 PSODs affected ESXi 5. hosts, do let us know if Supermicro have decent findings.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

View solution in original post

9 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Alex,

While it *could* be a classic v4 PSOD, in my experience these tend to be a bit more empty and with varying backtraces (but then again, this depends).

This smells like this one:

https://kb.vmware.com/kb/2045017

Note the world "WWWW:vmmM:VirualMachineName" syntax (which is not in your screen shot, I am pretty sure backtraces don't appear like this since 4.x so don't mind that, the CpuSchedVcpuSwitch + CpuSchedDispatch is the relevant part here.

Also, don't mind the fact that only Dell and HPE are referenced in the 'Resolution' of the kb article : same Intels = same BIOS = same problems.

Check how old the BIOS you are using on this host, as per that document it only got fixed with Dells by their release on 9/25/2013, if way later in use then sure start maybe thinking classic v4 .

Also, any particular reason you are running a host build from 2014? I advise to update to ESXi 5.1 Patch 9 (2016-05-24, build:3872664) to avoid other known issues that were fixed in the interim.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o

av1123
Contributor
Contributor
Jump to solution

Hello TheBobkin,

Checked my bios version, it is 2.0a released 08.25.2016, what is later then the date, you have mentioned. But, according to supermicro's website - it is the latest bios version for this m/b. Anyway, I will try to cooperate with vendor to resolve this issue. And will upgrade ESXi 5.1 to the latest release.

I hope those actions will help.

Thanks alot!

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Alex,

Strange, not sure why I didn't assume by some of the other components in use but those servers SYS-6018R-MTR are modern enough (the ancient host build threw me off!), I can only find the model using v4 processors certified for use with ESXi 6.0 unless the name/number is not what matches in the HCL DB:

https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=server&productid=40944&vcl=...

Any particular reason you are still running 5.1? I understand if there are some other applications that tie in with it that are not compatible with later versions of ESXi/vCenter, but otherwise I strongly recommend upgrading to 6.0 or even just later builds of 5.5 which are very solid.

It looks like this BIOS version (2.0a) resolves v4 PSODs:

(In German but the backtraces are one of the typical v4 backtraces)

https://www.thomas-krenn.com/de/wiki/VMware_PSOD_auf_Systemen_mit_Intel_Xeon_E5-2600_v4_CPUs_und_ESX...

I will aim to have a look into it further for possible verification if v4 PSODs affected ESXi 5. hosts, do let us know if Supermicro have decent findings.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

av1123
Contributor
Contributor
Jump to solution

Hello Bob!

Thanks for your advices.

Well, supermicro's tech support told didn't told me anything execpt that my platform is  officially Certified by VMware from ESXi version 5.5 U2:

OS Certification Intel | Support - Super Micro Computer, Inc. Too bad i didn't found that link before. They also confirmed, that issue with v4 Xeons is already solved in bios 2.0a.

Anyway, i have upgraded host to ESXi 5.5 build 3248547 (I dont currently have valid keys for version 6.0) and my vCenter to 5.5 too, aswell. But something went wrong and now some plugins cant connet to vCenter via 8443 port - vCenter doesn't even listen that port. Seems like some services are not proper installed, but it is another story anyway Smiley Happy

Will look forward if the problem with PSOD gone or not and will post here if it persists.

Thanks!

Reply
0 Kudos
av1123
Contributor
Contributor
Jump to solution

Well, upgrading to ESXi 5.5 didn't solved the problem. Got another PSOD today.

It is different to the previous PSOD's, so the issue may be on something else.

So, i need help to identify the problem again. Smiley Happy

Thanks!

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Do you have a screenshot of the backtrace?

Reply
0 Kudos
av1123
Contributor
Contributor
Jump to solution

Dont have it currebtly, but if it is stored on the host - i can make it. Can you just tell where is it?

Thanks!

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

The screenshot you attached today is the same image file that you added previously.

Taking a screenshot of any PSOD screens is the best thing to do before rebooting.

If you didn't take a screenshot, did you write down any of the error codes? (the strings between ']' and '@')

Alternatively the other way to get information as to why it POSDed is by contacting VMware support for them to analyze the coredump that should be located in /var/core (provided this was successful).

Reply
0 Kudos
av1123
Contributor
Contributor
Jump to solution

Oh God, that was my fault, uploaded an old screenshot, sorry.

Sure, i've got a new screenshot, updloaded it in this message.

Sorry for this misunderstanding Smiley Happy

Reply
0 Kudos