gregsn
Enthusiast
Enthusiast

ESXi 5.5 Update 3: Deleting Snapshot Crashes VM: Unexpected signal: 11.

After upgrading from ESXi 5.5 Update 2 to Update 3, deleting snapshot randomly crashed virtual machines.  Updating VMware tools didn't help.  No crashing problems before with snapshots on the same server for the last ~3 years.  Problem started with Update 3.

So far, this has happened with Windows XP, 2003, 2008R2 OS with updated VMware Tools.

Here is an example of one of the systems that just crashed right after a snapshot delete:

2015-09-20T07:02:51.386Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: nextState = 4 uid 0

2015-09-20T07:02:51.386Z| vcpu-0| I120: Closing disk scsi0:0

2015-09-20T07:02:51.388Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 4858240.

2015-09-20T07:02:51.388Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : closed.

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (24) size = 16912384, hd = 1909127. Type 8

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-000002-delta.vmdk" (0x18)

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' (0x18): vmfsSparse, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 1, numSubChains = 1

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1909127, extentType = 0

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1909127.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 16218d-cbt.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/16218d-cbt

2015-09-20T07:02:51.397Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk" (flags 0x18, type vmfsSparse).

2015-09-20T07:02:51.397Z| vcpu-0| I120: SnapshotVMXNeedConsolidateIteration: Size of helper disk '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' = 17825792 bytes, approx. time required for consolidating helper disk = 0.447157 sec.

2015-09-20T07:02:51.398Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1909127.

2015-09-20T07:02:51.398Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.406Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.406Z| vcpu-0| I120: SnapshotVMXNeedConsolidateIteration: Another iteration of helper branch is not needed.

2015-09-20T07:02:51.407Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (17) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.407Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.414Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.414Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.422Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : open successful (1041) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.422Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.433Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.450Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: nextState = 1 uid 0

2015-09-20T07:02:51.450Z| vcpu-0| I120: Closing all the disks of the VM.

2015-09-20T07:02:51.450Z| vcpu-0| I120: Closing disk scsi0:1

2015-09-20T07:02:51.453Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1646984.

2015-09-20T07:02:51.453Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.483Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local_1-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.483Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local_1-flat.vmdk" : closed.

2015-09-20T07:02:51.483Z| vcpu-0| I120: SNAPSHOT: SnapshotCombineDisks: Consolidating from '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' to '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk'.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : open successful (24) size = 68719476736, hd = 1581449. Type 3

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-flat.vmdk" (0x18)

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk' (0x18): vmfs, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1581449.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 182189-cbt.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/182189-cbt

2015-09-20T07:02:51.486Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk" (flags 0x18, type vmfs).

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (24) size = 16912384, hd = 1843596. Type 8

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-000002-delta.vmdk" (0x18)

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' (0x18): vmfsSparse, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 1, numSubChains = 1

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1843596, extentType = 0

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1843596.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 1b218d-cbt.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/1b218d-cbt

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk" (flags 0x18, type vmfsSparse).

2015-09-20T07:02:51.490Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1843596.

2015-09-20T07:02:51.490Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.492Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1581449.

2015-09-20T07:02:51.492Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 2, numSubChains = 1

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1581449, extentType = 2

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(1) fid = 1843596, extentType = 0

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1843596.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 1d218d-cbt.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/1d218d-cbt

2015-09-20T07:02:51.732Z| vcpu-0| I120: DISKLIB-LIB   : Upward Combine 2 links at 0. Need 0 MB of free space (4680409 MB available)

2015-09-20T07:02:51.736Z| vcpu-0| I120: DDB: "longContentID" = "aa8be979a63829fd10c5231db538b5bf" (was "523aed1c88135605b76574b6933eceb3")

2015-09-20T07:02:51.777Z| vcpu-0| I120: DISKLIB-CTK   : End Combine

2015-09-20T07:02:51.783Z| vcpu-0| I120: DISKLIB-CTK   : Unlinked /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk, tmp file: /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.849Z| vcpu-0| I120: DISKLIB-CTK   : resuming /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.850Z| vcpu-0| I120: DISKLIB-CTK   : Renaming: /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp -> /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk

2015-09-20T07:02:51.851Z| vcpu-0| I120: DISKLIB-CTK   : Attempting unlink of /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.853Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1843596.

2015-09-20T07:02:51.853Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.861Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.861Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : closed.

2015-09-20T07:02:51.861Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.862Z| vcpu-0| A115: ConfigDB: Setting scsi0:0.fileName = "es-1.company.local.vmdk"

2015-09-20T07:02:51.875Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (1041) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.875Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.885Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.890Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.897Z| vcpu-0| I120: SNAPSHOT: SnapshotDiskTreeFind: Detected node change from 'scsi0:0' to ''.

2015-09-20T07:02:51Z[+0.000]| vcpu-0| W110: Caught signal 11 -- tid 35868 (addr 98)

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: rip 0x18e79357 rsp 0x3fffb14f910 rbp 0x3fffb14fa00

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: rax 0x3236cc40 rbx 0x32356140 rcx 0x50 rdx 0x32356140 rsi 0x3236cc40 rdi 0x325ec4a0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120:         r8 0x3fffb14f66b r9 0x6f72662065676e61 r10 0x0 r11 0x0 r12 0x3236cc40 r13 0x325ec4a0 r14 0x325e58b0 r15 0x0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F910 : 0x0000000000000000 0x0000000000000010

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F920 : 0x000003fffb14f9c0 0x0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F930 : 0x0000000000000000 0x0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F940 : 0x0000000000000000 0x00000000325a9ea0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F950 : 0x000003ff00000000 0x0000000018c66278

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F960 : 0x000003fffb14f9a8 0x000003fffb14f9d8

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F970 : 0x00000000325e58b0 0x0000000032596930

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F980 : 0x000003fffb14f9c0 0x0000000018c66534

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace:

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[0] 000003fffb14f430 rip=0000000018e934fe rbx=0000000018e92cd0 rbp=000003fffb14f450 r12=0000000000000000 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[1] 000003fffb14f460 rip=000000001899770c rbx=000000000000000b rbp=000003fffb14f630 r12=0000000000000003 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[2] 000003fffb14f640 rip=000000000036c00f rbx=0000000032356140 rbp=000003fffb14f880 r12=000003fffb14f6c0 r13=00000000325ec4a0 r14=00000000325e58b0 r15=0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[0] 000003fffb14f430 rip=0000000018e934fe in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[1] 000003fffb14f460 rip=000000001899770c in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[2] 000003fffb14f640 rip=000000000036c00f

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Unexpected signal: 11.

2015-09-20T07:02:51Z[+3.088]| vcpu-0| W110: A core file is available in "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmx-zdump.000"

2015-09-20T07:02:51Z[+3.088]| vcpu-0| W110: Writing monitor corefile "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmmcores.gz"

2015-09-20T07:02:51Z[+3.093]| vcpu-0| I120: Counting amount of anonymous memory

2015-09-20T07:02:51Z[+3.106]| vcpu-0| I120: Total Count of Anon Pages and CR3 pages 20226

2015-09-20T07:02:51Z[+3.113]| vcpu-0| W110: Dumping core for vcpu-0

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: VMK Stack for vcpu 0 is at 0x4123af755000

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: Beginning monitor coredump

2015-09-20T07:02:51Z[+3.949]| vcpu-0| I120: End monitor coredump

2015-09-20T07:02:51Z[+3.949]| vcpu-0| W110: Dumping core for vcpu-1

2015-09-20T07:02:51Z[+3.949]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-20T07:02:51Z[+3.950]| vcpu-0| I120: VMK Stack for vcpu 1 is at 0x4123afc15000

2015-09-20T07:02:51Z[+3.950]| vcpu-0| I120: Beginning monitor coredump

2015-09-20T07:02:51Z[+4.756]| vcpu-0| I120: End monitor coredump

2015-09-20T07:02:51Z[+4.756]| vcpu-0| W110: Dumping extended monitor data

2015-09-20T07:02:51Z[+9.169]| vcpu-0| I120: CoreDump: ei->size 133267456 : len = 133267456

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace:

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[0] 000003fffb14ef30 rip=0000000018e934fe rbx=0000000018e92cd0 rbp=000003fffb14ef50 r12=0000000000000000 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[1] 000003fffb14ef60 rip=00000000188b57c5 rbx=00000000198a98a8 rbp=000003fffb14f450 r12=0000000000000001 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[2] 000003fffb14f460 rip=0000000018997766 rbx=000000000000000b rbp=000003fffb14f630 r12=0000000000000003 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[3] 000003fffb14f640 rip=000000000036c00f rbx=0000000032356140 rbp=000003fffb14f880 r12=000003fffb14f6c0 r13=00000000325ec4a0 r14=00000000325e58b0 r15=0000000000000000

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: SymBacktrace[0] 000003fffb14ef30 rip=0000000018e934fe in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[1] 000003fffb14ef60 rip=00000000188b57c5 in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[2] 000003fffb14f460 rip=0000000018997766 in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[3] 000003fffb14f640 rip=000000000036c00f

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: Msg_Post: Error

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120+ Unexpected signal: 11.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmware.log".

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.requestSupport.withoutLog] You can request support.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.requestSupport.vmSupport.vmx86]

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120+ To collect data to submit to VMware technical support, run "vm-support".

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.response] We will respond on the basis of your support entitlement.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: ----------------------------------------

2015-09-20T07:02:51Z[+9.179]| vcpu-0| I120: Exiting

Tags (1)
57 Replies
Techie01
Hot Shot
Hot Shot

can you share the vmkernel.log also during the same period.

0 Kudos
vNEX
Expert
Expert

at the time of crash please post from affected VMs and Hosts :

vmkernel.log

vmware.log

/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmx-zdump.000

/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmmcores.gz



_________________________________________________________________________________________ If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards, P.
0 Kudos
gregsn
Enthusiast
Enthusiast

Please see the "original" files attached as requested.  Let me know if you need any other info.

0 Kudos
peetz
Leadership
Leadership

Hi Gregory,

this looks like a bug.

if you have a paid ESXi license and are eligible for VMware Support then please open a support request with VMware.

Regards

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
vNEX
Expert
Expert

Hi Greg,

agreed with Andreas go ahead and file SR to VMware Support ... in the meantime can you try following:

1. If the issue is reproducible try to disable CBT in affected VMs and test again what happens when snapshot is deleted.

For disabling CBT follow this KB:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103187...

2. Verify if affected VMs have vShield/Guest introspection driver installed as a part of VMTools install if its true remove that driver and test snapshot deletion again.

_________________________________________________________________________________________ If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards, P.
0 Kudos
gregsn
Enthusiast
Enthusiast

I haven't yet opened a support case, but I think the problem may be due to the location of the virtual machine relative to the root path of the data store.

When the crashing occurred, the systems were under the /Virtual Machines/Production folder.  Other hosts with 5.5U3 with virtual machines under the root data store path "/" have not yet had the crashing problem.  After moving the virtual machines to the root path of the data store, I've been unable to reproduce the crashing.   Previously, at least one VM would crash during every Veeam backup cycle when they under /Virtual Machines/Production.  After moving them to the root and running the backup a few times, they haven't crashed yet.  I'm going to leave the system as-is for now and see what happens. 

If I feel adventurous, I may move a non-critical VM back to /Virtual Machines/Production and see if I can get it to crash (I was able to crash the VM after 5-10 snapshot create/delete operations manually before).

0 Kudos
Sergey_Petrushi
Contributor
Contributor

Having the same issues:

ESXi 5.5 Update 3 + NetBackup

Around 5-6 VM's usually crashed after backup.

0 Kudos
ssysadmin
Contributor
Contributor

We're also affected by this. Multiple VMs have crashed and been restarted by HA. Some have had issues after restart and meant downtime on important services.

Seems to be triggered by removing snapshots, which mostly happens after veeam backup.

In the vm log just before the crash is this - I suspect the "node change" is significant:

vcpu-0| I120: SNAPSHOT: SnapshotDiskTreeFind: Detected node change from 'scsi0:0' to ''.

vcpu-0| W110: Caught signal 11 -- tid 296089 (addr 6B13458)

We have contacted vmware support, but no resolution or workaround yet.

0 Kudos
ssysadmin
Contributor
Contributor

VMware support acknowledged it's a bug in update 3, no fix or workaround yet.

0 Kudos
averling2012
Contributor
Contributor

can you outline any article they may have referenced to you in regards to this issue

0 Kudos
ThorstenT
Contributor
Contributor

We're affected as well. Support told me engineering is involved and there is a PR now.

The only workarounds are disabling CBT or is downgrading ESXi.

Disabling CBT equals powering down VMs and editing .vmx files. Besides the potential performance impact, this is not really feasible with a large number of VMs.

[EDIT]: Support updated me and told me that disabling CBT does not seem to help.

Downgrading to 5.5U2 gives another potential source for crashing VMs during backup. We were also affected by http://kb.vmware.com/kb/2115997‌, leaving us a choice between the devil and the deep blue sea.

If you are affected and have a support contract with VMware, I'd urge to open a SR. The more complaints, the quicker we will see an emergency patch for this.

0 Kudos
ssysadmin
Contributor
Contributor

We tried turning off CBT, it made no difference, and we reported this back to vmware support.

Turning off, copying, and starting the copy seemed to help, but we haven't tested thoroughly.

We can't keep crashing machines to test this - all our servers that are affected are in production. We're rolling back update 3.

AFAIK there's no article or other "public" acknowledgement of this bug.

0 Kudos
gregsn
Enthusiast
Enthusiast

So far, after moving the virtual machines from the /Virtual Machines/Production folder to the root of the datastore, I haven't had a single crash.  No other changes other than unregistering and re-registering the virtual machines were made that I'm aware of (I've tried unregistering and re-registering the virtual machines before moving them and that didn't seem to help).

Is anyone else running their virtual machines under sub-folders on the datastore or are they all under the root path?

Also, the affected system I'm referring to is using DAS so maybe that's a contributing/mitigating factor of some sort. 

0 Kudos
Sergey_Petrushi
Contributor
Contributor

Turning OFF CBT isn't working. Also, since it required to do and delete snapshot, it cause a lot VM crashes.

I hope that VMware will release fix for this CRITICAL issue very fast.

0 Kudos
ssysadmin
Contributor
Contributor

We run iSCSI SANs, with the VM folders at the root of the datastores - so that's not it for us at least.

Maybe the act of moving/migrating the files cleared something up? Don't think we tried that.

We had issues on 2 different SANs, 2 different clusters, windows and linux. Same host HW, but that seems unlikely to matter.

0 Kudos
gregsn
Enthusiast
Enthusiast

We have now experienced crashing on systems with DAS and virtual machines moved from sub-folders into the root folder.  There doesn't seem to be any pattern to the crashing but:

IT HAS BEEN NEARLY A WEEK SINCE THE ISSUE HAS BEEN REPORTED AND THIS IS A VERY CRITICAL PROBLEM. IT WOULD BE NICE TO HEAR FROM VMWARE ON WAYS TO MITIGATE IT.

KBadmin
Contributor
Contributor

‌We had the same problem after upgrading from esxi 5.5 u2 to esxi 5.5 u3. Our VM's (MS Server 2008 and MS Server 2012 R2) crashed after deleting snapshots and Acronis couldn't backup the VM's. After installing older vmtools the VM's crashed not so often. Now we rolled back to esxi5.5 u2 and we have no snapshot problems yet.

0 Kudos
JeffreyZhang123
Contributor
Contributor

Same problem here! Have to disable VEEAM backup jobs....

0 Kudos
MikeStone226
Contributor
Contributor

Same here... ticket open for days and not a single update.

0 Kudos