VMware Cloud Community
pzebracki
Contributor
Contributor

3 times Purple Screen of death

Hi

I'm looking for help, last few days im getting PSOD

What is cause of that .

My machine conflagration

ESXi host is installed on USB stick.

Supermicro X10SLH-F with Intel Xeon E3-1270 v3 on board

2 x 8GB GOODRAM ECC UNBUFFERED DDR3 1600MHz PC3-12800E UDIMM | W-MEM1600E38G

Be Quiet! Dark Power PRO 10 650W 80PLUS Gold

LSI MegaRAID SAS 9271-8i

6 x Seagate SV35 Series (3TB, 64MB, SATA III-600) (ST3000VX000)

Please take a look on below print screens.

Best regards and im waiting for quick replay

http://s29.postimg.org/vdx717mo6/ESXi_Purple_Screen_of_death_4_1_2014.jpg

http://s29.postimg.org/fw9oxig7a/ESXi_Purple_Screen_of_death_2_1_2014.jpg



http://s29.postimg.org/5jnebfmo6/esxi_purple_screen.jpg

12 Replies
GaneshNetworks

Looks like the print screens that you have uploaded is broken. Please attach them again.

Meantime, Have a look at this - VMware KB: Interpreting an ESX/ESXi host purple diagnostic screen

~GaneshNetworks™~ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
pzebracki
Contributor
Contributor

i can see sceens, what iswrong ?

Reply
0 Kudos
GaneshNetworks

Capture.JPG

~GaneshNetworks™~ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
NuggetGTR
VMware Employee
VMware Employee

The exception 14 is a page file fault meaning its tried to load a page file into memory but its failed.

I have seen this when storage drops off.

And looking at you logs there is an heap of storage errors just before it PSOD like below which would account for failed pages

2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x85, CmdSN 0x9f from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x4d, CmdSN 0xa0 from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2013-12-19T20:28:43.051Z cpu5:34045)ScsiDeviceIO: 2337: Cmd(0x412e80824e00) 0x1a, CmdSN 0xa1 from world 34427 to dev "naa.600605b005d40d601a29058659cdb9ce" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2013-12-19T20:58:43.055Z cpu5:32794)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e80849840, 34427) to dev "naa.600605b005d40d601a29058659cdb9ce" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Does the server have a power saving option?

I have noticed that a low c state can cause the storage to drop, I recently came across identical issues with HP blades that had HP dynamic power mode settings set, When changing this removed the PSODs that you are seeing here. So give that a go im not familiar with supermicro but im sure it would have power saving features which do not mix well with ESXi

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
Reply
0 Kudos
pzebracki
Contributor
Contributor

At the moment havent UPS.

Other hints ?

Reply
0 Kudos
NuggetGTR
VMware Employee
VMware Employee

not a UPS, but the CPU powersaving mode
that is normally set in the bios....

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
Reply
0 Kudos
pzebracki
Contributor
Contributor

Tommorow i will check this via IPMI

and I will provide screens Smiley Happy

I saw also that option into bios...

Reply
0 Kudos
lloydm618
Contributor
Contributor

This is actually a known issue with the Intel E1000 NIC that is used by the guest VM. Power down the VM, then change the network adapter to VMXNET3 (for all the VMs on the host), and the host will stop PSODing.

It can be worked around by following this KB:

VMware KB: ESXi 5.x host experiences a purple diagnostic screen with errors for E1000PollRxRing and ...

Mike Lloyd TS Engineer II
jramsier1
Contributor
Contributor

I think this was my issue.  Has this yesterday and again today.  I just P2Ved a machine and it had a E1000 NIC on it.  Will find out if it crashes again, although I already moved a all the production servers off this VM.

Thanks,

-Jeff

Reply
0 Kudos
lloydm618
Contributor
Contributor

Okay, make sure to change the VM's NIC to VMXNET3 and then you shouldn't have those problems anymore. Smiley Happy

Mike Lloyd TS Engineer II
Reply
0 Kudos