Anyone else notice this trend?
What service pack of Windows 2003 are you using, and what additional Microsoft hotfixes, if any, have been applied to the VMs which are crashing?
Are there any patterns in the crash output (common STOP codes across BSODs)?
What sorts of applications are running in the VMs when the crashes occur? Are these applications 32-bit or 64-bit?
It is running service pack 2 fully patched. The server is running SAP and SQL 2005 (service pack 2), both apps are 64-bit. Here is the crash dump.
*******************************************************************************
*
Bugcheck Analysis *
*
*******************************************************************************
KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc000001d, The exception code that was not handled
Arg2: fffffadfa01c0cd8, The address that the exception occurred at
Arg3: 0000000000000002, Parameter 0 of the exception
Arg4: 0000000000000000, Parameter 1 of the exception
Debugging Details:
-
PEB is paged out (Peb.Ldr = 00000000`7efdf018). Type ".hh dbgerr001" for details
PEB is paged out (Peb.Ldr = 00000000`7efdf018). Type ".hh dbgerr001" for details
EXCEPTION_CODE: (NTSTATUS) 0xc000001d - Illegal Instruction An attempt was made to execute an illegal instruction.
FAULTING_IP:
+fffffadfa01c0cd8
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
fffffadf`a01c0cd8 ?? ???
EXCEPTION_PARAMETER1: 0000000000000002
EXCEPTION_PARAMETER2: 0000000000000000
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0x1E
PROCESS_NAME: VMwareTray.exe
CURRENT_IRQL: 2
EXCEPTION_RECORD: fffff800003cdd10 -- (.exr 0xfffff800003cdd10)
ExceptionAddress: fffffadfa01c0cd8
ExceptionCode: c000001d (Illegal instruction)
ExceptionFlags: 00000000
NumberParameters: 0
TRAP_FRAME: fffff800003cdda0 -- (.trap 0xfffff800003cdda0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000004 rbx=fffffadf96467620 rcx=0000000000000201
rdx=0000000000000002 rsi=fffffadfa2cbf7ff rdi=fffffadfa0abfa20
rip=fffffadfa01c0cd8 rsp=fffff800003cdf38 rbp=fffffadf937e7cf0
r8=00000000000c002f r9=fffffadfa2c95040 r10=0000000000000003
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
fffffadf`a01c0cd8 ?? ???
Resetting default scope
LAST_CONTROL_TRANSFER: from fffff80001080da6 to fffff8000102e7d0
FAILED_INSTRUCTION_ADDRESS:
+fffffadfa01c0cd8
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
Page 95c0 not present in the dump file. Type ".hh dbgerr004" for details
fffffadf`a01c0cd8 ?? ???
STACK_TEXT:
fffff800`003cd618 fffff800`01080da6 : 00000000`0000001e ffffffff`c000001d fffffadf`a01c0cd8 00000000`00000002 : nt!KeBugCheckEx
fffff800`003cd620 fffff800`0102e5ef : fffff800`003cdd10 fffffadf`a01c0c10 fffff800`003cdda0 00000000`00000001 : nt!KiDispatchException+0x128
fffff800`003cdc20 fffff800`0102cb83 : fffff800`003cdda0 00000000`00000002 00000000`00000000 fffffadf`00000000 : nt!KiExceptionExit
fffff800`003cdda0 fffffadf`a01c0cd8 : fffffadf`a2cbf844 00000000`7efdb000 00000000`92175b46 00000000`00000000 : nt!KiInvalidOpcodeFault+0xc3
fffff800`003cdf38 fffffadf`a2cbf844 : 00000000`7efdb000 00000000`92175b46 00000000`00000000 fffffadf`937e7cf0 : 0xfffffadf`a01c0cd8
fffff800`003cdf40 00000000`7efdb000 : 00000000`92175b46 00000000`00000000 fffffadf`937e7cf0 00000000`00000246 : 0xfffffadf`a2cbf844
fffff800`003cdf48 00000000`92175b46 : 00000000`00000000 fffffadf`937e7cf0 00000000`00000246 fffff800`010284e1 : 0x7efdb000
fffff800`003cdf50 00000000`00000000 : fffffadf`937e7cf0 00000000`00000246 fffff800`010284e1 00000000`00000246 : 0x92175b46
fffff800`003cdf58 fffffadf`937e7cf0 : 00000000`00000246 fffff800`010284e1 00000000`00000246 fffffadf`a2a14060 : 0x0
fffff800`003cdf60 00000000`00000246 : fffff800`010284e1 00000000`00000246 fffffadf`a2a14060 00000000`01187806 : 0xfffffadf`937e7cf0
fffff800`003cdf68 fffff800`010284e1 : 00000000`00000246 fffffadf`a2a14060 00000000`01187806 fffffadf`a1fe6748 : 0x246
fffff800`003cdf70 fffff800`0103109f : fffff800`011b0180 fffff800`011b0180 fffffadf`937e7cf0 fffffadf`a2a199d0 : nt!KiRetireDpcList+0x150
fffff800`003ce000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiDispatchInterrupt+0x4f
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!KiDispatchException+128
fffff800`01080da6 cc int 3
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: nt!KiDispatchException+128
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
IMAGE_NAME: ntkrnlmp.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 46237547
FAILURE_BUCKET_ID: X64_0x1E_BAD_IP_nt!KiDispatchException+128
BUCKET_ID: X64_0x1E_BAD_IP_nt!KiDispatchException+128
Followup: MachineOwner
-
0: kd> .exr 0xfffff800003cdd10
ExceptionAddress: fffffadfa01c0cd8
ExceptionCode: c000001d (Illegal instruction)
ExceptionFlags: 00000000
NumberParameters: 0
Yes, we are seeing the same thing. ESX 3.0.2, Windows 2003 x64 SP2, SAP ECC 5.0 and Oracle 10.0.2, both 64 bit. Just in the last couple months have we seen this instability - before that we were rock solid for a good year. I am thinking ESX updates or Windows updates or some combination thereof (happenend around the same time) but not sure yet. What hardware are you running on? We are running on HP BL685c with qLogic HBAs, EMC CLARiiON CX3-40 on the back end w/Cisco 9134 FC switches.
I'm going to try to dig into this today and see what I can find as the crashes seem to be getting more frequent. I am suspecting something at the storage level - I think I remember one of the ESX updates addressed a storage driver.
I'll post if I find anything. Otherwise I plan to update firmware on the servers and apply all the latest ESX patches and cross fingers...
I'm glad it's not just me noticing it then.
Keep me posted!
BaldwinM@alxn.com
All signs point to KB932596, a security update. We installed this in our environment on 3/28/08, so the timing fits for when we first started to notice problems. From that KB:
After you install this update on a computer that is running an x64-based version of Windows Server 2003, of Windows Vista, or of Windows Server 2008, the computer may randomly restart, and then you may receive a Stop error message. The Stop error code may be 0x0000001E, 0x000000D1, or another Stop error code.
To resolve this problem, install hotfix 950772.
For more information, click the following article number to view the article in the Microsoft Knowledge Base: 950772 (http://support.microsoft.com/kb/950772/) A computer that is running an x64-based version of Windows Server 2003, of Windows Vista, or of Windows Server 2008 randomly restarts and then generates a Stop error
The symptoms fit exactly as we have seen both 0x0000001E and 0x00000D1 and crashes seem to be random as we have not been able to tie it to any specific application activity. I will be installing hotfix 950772 this weekend. I will give it a couple weeks and if we have no crashes (we have 10 2003 x64 VMs and from recent history we would definitely see one somewhere in 2 weeks) I will consider it a success and will post again. Actually, I'll post either way but let's hope for good news.
We've had to go into the BIOS of our PowerEdge Servers and enable the x64 ability. Has this already been done on your hardware?
That is a great find! I'm downloading it now and will install on our test environement. Thanks!
I believe so. I actually don't remember seeing the setting in our IBM LS41 blades. I'll have to check the next chance I get though.
Installed in 2 sandbox systems without incident so we'll see what happens - if they stay solid throughout the weekend I will apply to our DEV and QA systems, although I'll probably hold off on PRD for a week to make sure there's no new problems introduced.
Good idea.
Confirmation that MS hotfix 950772 has resolved the problems in our environment. It's been 3+ weeks now in all of our x64 systems (10 VMs) and no crashes. We were having a couple systems crash per week prior to applying the hotfix.
Same here. Thanks all!
Is this just happening on AMD procs? We're seeing the same issue with Windows 2008 64bit (running AMD Opterons) and according to the KB article and a phone call to MS there is no patch for the 2008 x64OS (yet). Anyone having these crashes on Intel procs or have a recommendation/solution for Win2008 64bit systems?
Thanks, tom