One of the VMs on a client's Dell Poweredge 2950 is crashing and/or rebooting randomly. The guest OS is Small Business Server 2008 (which is 64-bit) and both the guest OS and ESXi have the latest updates. I have allocated about 7GB of RAM to the SBS vm and the VM is set as Microsoft Windows - Microsoft WIndows Server 2008 (64-bit).
When the crashes happen there's really no problems indicated in the guest OS logs. However, in the ESXi logs I sometimes get *** Virtual machine kernel stack fault (hardware reset) ***
Any ideas why this may be happening? I have included part of the logs below in case that helps narrow things down:
GUEST OS LOG:
Apr 22 15:28:54.958: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0
Apr 22 15:28:54.958: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:28:54.958: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:33:54.968: vcpu-2| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0
Apr 22 15:33:54.968: vcpu-2| CDROM: Unknown command 0xAC.
Apr 22 15:33:54.968: vcpu-2| CDROM: Unknown command 0xAC.
Apr 22 15:38:54.979: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0
Apr 22 15:38:54.979: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:38:54.979: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:43:54.995: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0
Apr 22 15:43:54.997: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:43:54.997: vcpu-1| CDROM: Unknown command 0xAC.
Apr 22 15:46:58.961: vcpu-0| Triple fault.
Apr 22 15:46:58.961: vcpu-0| Msg_Hint: msg.monitorEvent.cpl0SS (sent)
Apr 22 15:46:58.961: vcpu-0| **** Virtual machine kernel stack fault (hardware reset) ****
Apr 22 15:46:58.961: vcpu-0| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Apr 22 15:46:58.961: vcpu-0|
Apr 22 15:46:58.961: vcpu-0| -
Apr 22 15:46:58.963: vcpu-0| Triple fault.
Apr 22 15:46:58.963: vcpu-0| Msg_Hint: msg.monitorEvent.cpl0SS (sent)
Apr 22 15:46:58.963: vcpu-0| *** Virtual machine kernel stack fault (hardware reset) ***
Apr 22 15:46:58.963: vcpu-0| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Apr 22 15:46:58.963: vcpu-0|
Apr 22 15:46:58.963: vcpu-0| -
Apr 22 15:46:58.963: vmx| POLL device deleted
Apr 22 15:46:58.964: vcpu-0| CPU reset: hard
Apr 22 15:46:59.239: vcpu-1| CPU reset: soft
Apr 22 15:47:08.192: vcpu-2| CPU reset: soft
Apr 22 15:47:17.150: vcpu-3| CPU reset: soft
Apr 22 15:47:26.120: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
Apr 22 15:47:26.120: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.247: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.284: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.479: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.607: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.612: vcpu-0| SVGA: Registering IOSpace at 0x1060
Apr 22 15:47:26.612: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:47:26.650: vcpu-1| CPU reset: soft
Apr 22 15:47:26.654: vcpu-2| CPU reset: soft
Apr 22 15:47:53.527: vcpu-3| CPU reset: soft
Apr 22 15:47:53.775: vcpu-0| DISKUTIL: scsi0:0 : geometry=13054/255/63
Apr 22 15:47:53.775: vcpu-0| DISKUTIL: scsi0:1 : geometry=78325/255/63
Apr 22 15:47:54.188: vcpu-1| CPU reset: soft
Apr 22 15:47:54.192: vcpu-2| CPU reset: soft
Apr 22 15:47:54.194: vcpu-3| CPU reset: soft
Apr 22 15:47:54.216: vcpu-0| BIOS-UUID is 56 4d c9 76 1a 90 67 da-55 bd e7 35 e2 e4 39 9e
Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:1 : toolsVersion = 7303
Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7303
Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:1 : toolsVersion = 7303
Apr 22 15:47:54.669: vcpu-0| DISKUTIL: scsi0:0 : toolsVersion = 7303
Apr 22 15:47:55.422: mks| HostOps hideCursor before defineCursor!
Apr 22 15:48:33.093: mks| HostOps hideCursor before defineCursor!
Apr 22 15:48:33.289: vcpu-1| CPU reset: soft
Apr 22 15:48:33.602: vcpu-2| CPU reset: soft
Apr 22 15:48:33.658: vcpu-3| CPU reset: soft
Apr 22 15:48:35.357: vcpu-0| SVGA: Unregistering IOSpace at 0x1060
Apr 22 15:48:35.357: vcpu-0| SVGA: Unregistering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:48:35.363: vcpu-0| SVGA: Registering IOSpace at 0x1060
Apr 22 15:48:35.365: vcpu-0| SVGA: Registering MemSpace at 0xf8000000(0xf8000000) and 0xf4000000(0xf4000000)
Apr 22 15:48:36.226: vcpu-3| CDROM: Mode Sense for Unsupported Page 0x1B
Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0
Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0
Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0
Apr 22 15:51:07.481: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0
Apr 22 15:51:07.482: vcpu-0| CDROM: Emulate GET CONFIGURATION RT 0 start feature 0
Apr 22 15:51:07.483: vcpu-1| CDROM: Emulate GET CONFIGURATION RT 1 start feature 0
HOST LOG:
Failed to send response to the client: Broken pipe
Throw vmodl.fault.RequestCanceled
(vmodl.fault.RequestCanceled) {
dynamicType = <unset>,
msg = ""
}
Failed to send response to the client: Broken pipe
Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx
Question info: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
, Id: 0 : Type : 5, Default: 0, Number of options: 2
Received a duplicate transition from foundry: 1
Failed to find activation record, event user unknown.
Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx
Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx
Question info: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
, Id: 1 : Type : 5, Default: 0, Number of options: 2
Failed to find activation record, event user unknown.
Disconnect check in progress: /vmfs/volumes/49673842-8b8607e2-8bcf-0022191e63d4/Small Business Server 2008/Small Business Server 2008.vmx
Event 18 : Message on Small Business Server 2008 on ESXi.systems.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Event 19 : Message on Small Business Server 2008 on ESXi.systems.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Received a duplicate transition from foundry: 1
Received a duplicate transition from foundry: 1
Received a duplicate transition from foundry: 1
Failed to validate VM IP address: unknown
Failed to validate VM IP address: unknown
Failed to validate VM IP address: unknown
Closing stream UNIX(/var/run/vmware/proxy-sdk) due to timeout
Your discussion has been moved to the Virtual Machine and Guest OS forum.
Dave Mishchenko
VMware Communities User Moderator
Hi,
we had the same Problem last night. The VM is a Sharepoint Server on W2K8 x64 with 4x CPU and 6G of RAM. The message in the VM logs was this :
Jun 09 00:00:05.986: vcpu-2| Triple fault.
Jun 09 00:00:05.986: vcpu-2| Msg_Hint: msg.monitorEvent.cpl0SS (sent)
Jun 09 00:00:05.986: vcpu-2| *** Virtual machine kernel stack fault (hardware reset) ***
Jun 09 00:00:05.986: vcpu-2| The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration o
f the virtual machine, a bug in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Jun 09 00:00:05.986: vcpu-2|
Jun 09 00:00:05.986: vcpu-2| -
Jun 09 00:00:05.989: vcpu-2| CPU reset: hard
From the ESX logs I found these messages at the same time :
vmkernel :
Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu4:1463)VSCSI: 2803: Reset request on handle 8303 (0 outstanding commands)
Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu15:1069)VSCSI: 3019: Resetting handle 8303
Jun 9 00:00:01 ASBC005VMS0L vmkernel: 40:12:24:38.333 cpu15:1069)VSCSI: 2871: Completing reset on handle 8303 (0 outstanding commands)
hostd :
Question info: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug
in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Received a duplicate transition from foundry: 1
Disconnect check in progress: /vmfs/volumes/4a07ecf4-55718657-699e-00144f462079/ARGOE01WWV0W/ARGOE01WWV0W.vmx
Failed to find activation record, event user unknown.
Event 6868 : Message on ARGOE01WWV0W on asbc005vms0l.mgmt01.local in ha-datacenter: *** Virtual machine kernel stack fault (hardware reset) ***
The virtual machine just suffered a stack fault in kernel mode. On a real computer, this would amount to a reset of the processor. It can be caused by an incorrect configuration of the virtual machine, a bug
in the operating system, or a problem in the VMware ESX Server software. Press OK to reboot virtual machine or Cancel to shut it down.
Received a duplicate transition from foundry: 1
Failed to send response to the client: Broken pipe
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740724
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740724
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740723
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig--1218740723
I am not really shure where the problem appears, is this VMware or Windows 2008 x64 on VMware ? The ESX Version is 3.5 Update 3.
Regards
Oliver
SBS 2008 also runs Sharepoint so you could be onto something there. The search service was constantly logging errors so I ended up disabling it but the crashes still happened after that anyway.
The Poweredge 2950 seems to be working ok now and hasn't crashed in almost a month. I called Dell on this issue and as usual they wanted me to start off by updating the systems and running diagnostics. I ran the latest poweredge update DVD that pretty much updates all firmware and drivers on the system and then ran full hardware diagnostics overnight. I was expecting some memory problems but all tests checked out fine.
But since the server has been stable since then, I guess I'll have to assume the updates are what fixed the problem (assuming the problem is in fact fixed).
I hope this is of some help to you.
As this isn't my own server but a client's and since it happened during the night, I didn't notice that the issue remains. A few nights ago the SBS vm rebooted itself. This time there weren't any events logged on the ESXi host or within the vm itself.
Although I have been finding some new warnings in the logs since mid May:
VFAT: 2583: Possible FAT corruption!: offset not found: 0x400, length 8704, seekOffset 0x400. File length: 9283, status: Limit exc
The issue ended up being Trend Micro's firewall driver on the NIC. We were using Worry Free Advanced and it installs a firewall driver on each NIC. This driver was the reason for the bluescreens as indicated by the Windows memory dumps. Trend knows about this and said they're working on a solution. Unchecking the firewall driver from the properties of the NIC and updating Worry Free Advanced seems to have fixed the problem.