VMware Cloud Community
LUPSHuolto
Contributor
Contributor

After upgrade esxi 5.0 1311175 keeps crashing

Hi everyone,

We upgraded two esxi hosts to 5.0 (1311175) one host is HP 460c G1 and the other one is HP 460c G7. vCenter server is 5.1 1123961. HA and DRS are on.

When we start new VMs on cluster other host (HP 460c G7) crashes. vmkwarning.log shows the following:

2014-03-22T09:39:42.381Z cpu6:67888)WARNING: CBT: 982: Unsupported ioctl 43

2014-03-22T09:39:46.464Z cpu7:2055)WARNING: LinScsi: SCSILinuxQueueCommand:1175:queuecommand failed with status = 0x1055 Host Busy vmhba1:0:0:0 (driver name: qla4xxx) - Message repeated 394 times

TSC: 5171154609 cpu0:0)WARNING: SVGAConsole: 266: Extended TTY not supported. Ignoring on tty 0

TSC: 5171371860 cpu0:0)WARNING: SVGAConsole: 266: Extended TTY not supported. Ignoring on tty 2

TSC: 5171527458 cpu0:0)WARNING: SVGAConsole: 266: Extended TTY not supported. Ignoring on tty 3

TSC: 8844832437 cpu0:0)WARNING: MemMap: 2637: Reducing number of colors from 192 to 64

0:00:00:03.496 cpu0:2048)WARNING: CacheSched: 801: Already disabled : Cache aware scheduling already disabled

0:00:00:03.501 cpu0:2048)WARNING: SVGAConsole: 266: Extended TTY not supported. Ignoring on tty 4

0:00:00:03.502 cpu0:2048)WARNING: SVGAConsole: 266: Extended TTY not supported. Ignoring on tty 5

2014-03-22T09:56:29.139Z cpu6:2661)WARNING: LinuxSignal: 761: ignored unexpected signal flags 0x2 (sig 17)

2014-03-22T09:59:14.559Z cpu3:2670)WARNING: LinuxSignal: 761: ignored unexpected signal flags 0x2 (sig 17)

2014-03-22T09:59:18.618Z cpu4:2662)WARNING: UserObj: 3232: Unimplemented operation on 0x410013d85af0/RPC

2014-03-22T09:59:18.618Z cpu4:2662)WARNING: UserObj: 675: Failed to crossdup fd 9, cnxId: 0x80000000 type RPC: Not implemented

2014-03-22T09:59:23.717Z cpu5:2679)WARNING: Tcpip: 806: Failed to unset the ip address (error = 49)

And host freezes and we have force reset:

HA.jpg

Any ideas what is causing this?

0 Kudos
5 Replies
khaliqamar
Enthusiast
Enthusiast

A page fault (Exception 14) occurs when the page being requested has not been successfully loaded into memory.


I hope this tread will answer your question;

ESXi 5.1 Update 1 crashing randomly with purple screen #PF Exception 14



0 Kudos
LUPSHuolto
Contributor
Contributor

Hi VirtualRay,

We have similar situation as in that thread number 12:

My guess would be that you have introduced Microsoft Windows Server 2012 to your ESXi environment recently.  That was my issue, I am certain.

Recently deployed about (10) 2012 virtual servers and started having problems.  I found another recommendation to disable Receive Side Scaling (RSS) on the E1000E NIC that installs by default with that operating system.  Since that change, I have not had any recurrance.

As recommended by VMware, I installed 1 firmware update to the disk controller, as well as 2 NIC driver updates and a disk controller driver update to the OS.  None of those corrected my problem and I kept getting the PSOD.  Once I made the RSS change on the 2012 servers my problem went away.

That being said - I have had only 4 clean days.  But I have high hopes.

Justin

We deployed two new 2012 R2 VMs and started to get a page faults (Exception 14).

We´ll disable RSS and let you know if this helps in our environment.

Thank you for helping us!

Rubeck
Virtuoso
Virtuoso

Apply patch release ESXi500-201401001  ... This updates it to build 1489271.

This has a fix for your isssue:

  • The ESXi host experiences a purple diagnostic screen with errors for E1000PollRxRing and E1000DevRx when the Receive Side Scaling (RSS) is enabled and the maximum multiple RSS rx queues is set to 2. The purple diagnostic screen or backtrace contains entries similar to:
0x412409add548:[0x418010610c57]E1000PollRxRing@vmkernel#nover+0xb73 stack: 0x412409add5c8
0x412409add5b8:[0x418010613bb5]E1000DevRx@vmkernel#nover+0x3a9 stack: 0x412409add668
0x412409add658:[0x418010592164]IOChain_Resume@vmkernel#nover+0x174 stack: 0x412409add6b8
0x412409add6a8:[0x418010579e22]PortOutput@vmkernel#nover+0x136 stack: 0x4108d2d9c780
0x412409add708:[0x418010b2ff58]EtherswitchForwardLeafPortsQuick@< None >#< None >+0x4c stack: 0x412409a
0x412409add928:[0x418010b30f51]EtherswitchPortDispatch@< None >#< None >+0xe25 stack: 0x412400000015

I have a 13 host cluster (All HP Proliant) here running this build... Got a bunch of 2012  R2 VMs running without issue.

/Rubeck

khaliqamar
Enthusiast
Enthusiast

Hi Lupshuolto,  So hows the situation now. Did you try Rubeck's suggestion.

0 Kudos
LUPSHuolto
Contributor
Contributor

Hi,

And thank you for your help. Disabled RSS and since the hosts has been running fine (over a week) without purple screen

Have to make that update as well since it has major fixes. I´ll turn RSS back on after update and see if update has fixed this bug.

0 Kudos