VMware Cloud Community
ryanv
Contributor
Contributor

Kernel stack fault - SBS 2008 on Poweredge 2950

One of the VMs on a client's Dell Poweredge 2950 is crashing and/or rebooting randomly. The guest OS is Small Business Server 2008 (which is 64-bit) and both the guest OS and ESXi have the latest updates. I have allocated about 7GB of RAM to the SBS vm and the VM is set as Microsoft Windows - Microsoft WIndows Server 2008 (64-bit).

When the crashes happen there's really no problems indicated in the guest OS logs. However, in the ESXi logs I sometimes get *** Virtual machine kernel stack fault (hardware reset) ***

Any ideas why this may be happening? I have included part of the logs below in case that helps narrow things down:

GUEST OS LOG:

Apr 22 15:28:54.958: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0

Apr 22 15:28:54.958: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:28:54.958: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:33:54.968: vcpu-2| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0

Apr 22 15:33:54.968: vcpu-2| CDROM: Unknown command 0xAC.

Apr 22 15:33:54.968: vcpu-2| CDROM: Unknown command 0xAC.

Apr 22 15:38:54.979: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0

Apr 22 15:38:54.979: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:38:54.979: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:43:54.995: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0

Apr 22 15:43:54.997: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:43:54.997: vcpu-1| CDROM: Unknown command 0xAC.

Apr 22 15:46:58.961: vcpu-0| Triple fault.

Apr 22 15:46:58.961: vcpu-0| Msg_Hint: msg.monitorEvent.cpl0SS (sent)

Apr 22 15:46:58.961: vcpu-0| **** Virtual machine kernel stack fault (hardware reset) ****

Apr 22 15:46:58.961: vcpu-0| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Apr 22 15:46:58.961: vcpu-0|

Apr 22 15:46:58.961: vcpu-0| -


Apr 22 15:46:58.963: vcpu-0| Triple fault.

Apr 22 15:46:58.963: vcpu-0| Msg_Hint: msg.monitorEvent.cpl0SS (sent)

Apr 22 15:46:58.963: vcpu-0| *** Virtual machine kernel stack fault (hardware reset) ***

Apr 22 15:46:58.963: vcpu-0| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Apr 22 15:46:58.963: vcpu-0|

Apr 22 15:46:58.963: vcpu-0| -


Apr 22 15:46:58.963: vmx| POLL device deleted

Apr 22 15:46:58.964: vcpu-0| CPU reset: hard

Apr 22 15:46:59.239: vcpu-1| CPU reset: soft

Apr 22 15:47:08.192: vcpu-2| CPU reset: soft

Apr 22 15:47:17.150: vcpu-3| CPU reset: soft

Apr 22 15:47:26.120: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

Apr 22 15:47:26.120: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.247: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.284: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.479: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.607: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.612: vcpu-0| SVGA: Registering IOSpace at 0x1060

Apr 22 15:47:26.612: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:47:26.650: vcpu-1| CPU reset: soft

Apr 22 15:47:26.654: vcpu-2| CPU reset: soft

Apr 22 15:47:53.527: vcpu-3| CPU reset: soft

Apr 22 15:47:53.775: vcpu-0| DISKUTIL: scsi0:0 : geometry=13054/255/63

Apr 22 15:47:53.775: vcpu-0| DISKUTIL: scsi0:1 : geometry=78325/255/63

Apr 22 15:47:54.188: vcpu-1| CPU reset: soft

Apr 22 15:47:54.192: vcpu-2| CPU reset: soft

Apr 22 15:47:54.194: vcpu-3| CPU reset: soft

Apr 22 15:47:54.216: vcpu-0| BIOS-UUID is 56 4d c9 76 1a 90 67 da-55 bd e7 35 e2 e4 39 9e

Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:1 : toolsVersion = 7303

Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7303

Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:1 : toolsVersion = 7303

Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7303

Apr 22 15:47:55.422: mks| HostOps hideCursor before defineCursor!

Apr 22 15:48:33.093: mks| HostOps hideCursor before defineCursor!

Apr 22 15:48:33.289: vcpu-1| CPU reset: soft

Apr 22 15:48:33.602: vcpu-2| CPU reset: soft

Apr 22 15:48:33.658: vcpu-3| CPU reset: soft

Apr 22 15:48:35.357: vcpu-0| SVGA: Unregistering IOSpace at 0x1060

Apr 22 15:48:35.357: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:48:35.363: vcpu-0| SVGA: Registering IOSpace at 0x1060

Apr 22 15:48:35.365: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)

Apr 22 15:48:36.226: vcpu-3| CDROM: Mode Sense for Unsupported Page 0x1B

Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0

Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0

Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0

Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0

Apr 22 15:51:07.482: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0

Apr 22 15:51:07.483: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0

HOST LOG:

Failed to send response to the client: Broken pipe

Activation : Invoke done on

Throw vmodl.fault.RequestCanceled

Result:

(vmodl.fault.RequestCanceled) {

dynamicType = <unset>,

msg = ""

}

Failed to send response to the client: Broken pipe

Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx

Question info: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

, Id: 0 : Type : 5, Default: 0, Number of options: 2

Received a duplicate transition from foundry: 1

Failed to find activation record, event user unknown.

Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx

Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx

Question info: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

, Id: 1 : Type : 5, Default: 0, Number of options: 2

Failed to find activation record, event user unknown.

Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx

Event 18 : Message on Small Business Server 2008 on ESXi.systems.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Event 19 : Message on Small Business Server 2008 on ESXi.systems.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Received a duplicate transition from foundry: 1

Received a duplicate transition from foundry: 1

Received a duplicate transition from foundry: 1

Turning off heartbeat checker

Failed to validate VM IP address: unknown

Failed to validate VM IP address: unknown

Failed to validate VM IP address: unknown

Closing stream UNIX(/var/run/vmware/proxy-sdk) due to timeout

Event 20 : User root@127.0.0.1 logged in

0 Kudos
5 Replies
Dave_Mishchenko
Immortal
Immortal

Your discussion has been moved to the Virtual Machine and Guest OS forum.

Dave Mishchenko

VMware Communities User Moderator

0 Kudos
Goliath222
Contributor
Contributor

Hi,

we had the same Problem last night. The VM is a Sharepoint Server on W2K8 x64 with 4x CPU and 6G of RAM. The message in the VM logs was this :

Jun 09 00:00:05.986: vcpu-2| Triple fault.

Jun 09 00:00:05.986: vcpu-2| Msg_Hint: msg.monitorEvent.cpl0SS (sent)

Jun 09 00:00:05.986: vcpu-2| *** Virtual machine kernel stack fault (hardware reset) ***

Jun 09 00:00:05.986: vcpu-2| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration o

f the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Jun 09 00:00:05.986: vcpu-2|

Jun 09 00:00:05.986: vcpu-2| -


Jun 09 00:00:05.989: vcpu-2| CPU reset: hard

From the ESX logs I found these messages at the same time :

vmkernel :

Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu4:1463)VSCSI: 2803: Reset request on handle 8303 (0 outstanding commands)

Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu15:1069)VSCSI: 3019: Resetting handle 8303

Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu15:1069)VSCSI: 2871: Completing reset on handle 8303 (0 outstanding commands)

hostd :

Question info: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug

in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Received a duplicate transition from foundry: 1

Disconnect check in progress: /vmfs/volumes/4a07ecf4-55718657-699e-00144f462079/ARGOE01WWV0W/ARGOE01WWV0W.vmx

Failed to find activation record, event user unknown.

Event 6868 : Message on ARGOE01WWV0W on asbc005vms0l.mgmt01.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***

The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug

in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.

Received a duplicate transition from foundry: 1

Failed to send response to the client: Broken pipe

Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740724

Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740724

Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740723

Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740723

Turning off heartbeat checker

I am not really shure where the problem appears, is this VMware or Windows 2008 x64 on VMware ? The ESX Version is 3.5 Update 3.

Regards

Oliver

0 Kudos
ryanv
Contributor
Contributor

SBS 2008 also runs Sharepoint so you could be onto something there. The search service was constantly logging errors so I ended up disabling it but the crashes still happened after that anyway.

The Poweredge 2950 seems to be working ok now and hasn't crashed in almost a month. I called Dell on this issue and as usual they wanted me to start off by updating the systems and running diagnostics. I ran the latest poweredge update DVD that pretty much updates all firmware and drivers on the system and then ran full hardware diagnostics overnight. I was expecting some memory problems but all tests checked out fine.

But since the server has been stable since then, I guess I'll have to assume the updates are what fixed the problem (assuming the problem is in fact fixed).

I hope this is of some help to you.

0 Kudos
ryanv
Contributor
Contributor

As this isn't my own server but a client's and since it happened during the night, I didn't notice that the issue remains. A few nights ago the SBS vm rebooted itself. This time there weren't any events logged on the ESXi host or within the vm itself.

Although I have been finding some new warnings in the logs since mid May:

VFAT: 2583: Possible FAT corruption!: offset not found: 0x400, length 8704, seekOffset 0x400. File length: 9283, status: Limit exc

0 Kudos
ryanv
Contributor
Contributor

The issue ended up being Trend Micro's firewall driver on the NIC. We were using Worry Free Advanced and it installs a firewall driver on each NIC. This driver was the reason for the bluescreens as indicated by the Windows memory dumps. Trend knows about this and said they're working on a solution. Unchecking the firewall driver from the properties of the NIC and updating Worry Free Advanced seems to have fixed the problem.

0 Kudos