Enthusiast
Enthusiast

ESXi 5.5 Update 3: Deleting Snapshot Crashes VM: Unexpected signal: 11.

After upgrading from ESXi 5.5 Update 2 to Update 3, deleting snapshot randomly crashed virtual machines.  Updating VMware tools didn't help.  No crashing problems before with snapshots on the same server for the last ~3 years.  Problem started with Update 3.

So far, this has happened with Windows XP, 2003, 2008R2 OS with updated VMware Tools.

Here is an example of one of the systems that just crashed right after a snapshot delete:

2015-09-20T07:02:51.386Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: nextState = 4 uid 0

2015-09-20T07:02:51.386Z| vcpu-0| I120: Closing disk scsi0:0

2015-09-20T07:02:51.388Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 4858240.

2015-09-20T07:02:51.388Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.394Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : closed.

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (24) size = 16912384, hd = 1909127. Type 8

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-000002-delta.vmdk" (0x18)

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' (0x18): vmfsSparse, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 1, numSubChains = 1

2015-09-20T07:02:51.395Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1909127, extentType = 0

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1909127.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 16218d-cbt.

2015-09-20T07:02:51.396Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/16218d-cbt

2015-09-20T07:02:51.397Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk" (flags 0x18, type vmfsSparse).

2015-09-20T07:02:51.397Z| vcpu-0| I120: SnapshotVMXNeedConsolidateIteration: Size of helper disk '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' = 17825792 bytes, approx. time required for consolidating helper disk = 0.447157 sec.

2015-09-20T07:02:51.398Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1909127.

2015-09-20T07:02:51.398Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.406Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.406Z| vcpu-0| I120: SnapshotVMXNeedConsolidateIteration: Another iteration of helper branch is not needed.

2015-09-20T07:02:51.407Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (17) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.407Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.414Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.414Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.422Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : open successful (1041) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.422Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.433Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.450Z| vcpu-0| I120: SnapshotVMXConsolidateOnlineCB: nextState = 1 uid 0

2015-09-20T07:02:51.450Z| vcpu-0| I120: Closing all the disks of the VM.

2015-09-20T07:02:51.450Z| vcpu-0| I120: Closing disk scsi0:1

2015-09-20T07:02:51.453Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1646984.

2015-09-20T07:02:51.453Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.483Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local_1-000001-delta.vmdk" : closed.

2015-09-20T07:02:51.483Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local_1-flat.vmdk" : closed.

2015-09-20T07:02:51.483Z| vcpu-0| I120: SNAPSHOT: SnapshotCombineDisks: Consolidating from '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' to '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk'.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : open successful (24) size = 68719476736, hd = 1581449. Type 3

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-flat.vmdk" (0x18)

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk' (0x18): vmfs, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1581449.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 182189-cbt.

2015-09-20T07:02:51.485Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/182189-cbt

2015-09-20T07:02:51.486Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local.vmdk" (flags 0x18, type vmfs).

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (24) size = 16912384, hd = 1843596. Type 8

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-DSCPTR: Opened [0]: "es-1.company.local-000002-delta.vmdk" (0x18)

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-LINK  : Opened '/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk' (0x18): vmfsSparse, 134217728 sectors / 64 GB.

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 1, numSubChains = 1

2015-09-20T07:02:51.487Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1843596, extentType = 0

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1843596.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 1b218d-cbt.

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/1b218d-cbt

2015-09-20T07:02:51.488Z| vcpu-0| I120: DISKLIB-LIB   : Opened "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002.vmdk" (flags 0x18, type vmfsSparse).

2015-09-20T07:02:51.490Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1843596.

2015-09-20T07:02:51.490Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.492Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1581449.

2015-09-20T07:02:51.492Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 2, numSubChains = 1

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 1581449, extentType = 2

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(1) fid = 1843596, extentType = 0

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 1843596.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 1d218d-cbt.

2015-09-20T07:02:51.500Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/1d218d-cbt

2015-09-20T07:02:51.732Z| vcpu-0| I120: DISKLIB-LIB   : Upward Combine 2 links at 0. Need 0 MB of free space (4680409 MB available)

2015-09-20T07:02:51.736Z| vcpu-0| I120: DDB: "longContentID" = "aa8be979a63829fd10c5231db538b5bf" (was "523aed1c88135605b76574b6933eceb3")

2015-09-20T07:02:51.777Z| vcpu-0| I120: DISKLIB-CTK   : End Combine

2015-09-20T07:02:51.783Z| vcpu-0| I120: DISKLIB-CTK   : Unlinked /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk, tmp file: /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.849Z| vcpu-0| I120: DISKLIB-CTK   : resuming /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.850Z| vcpu-0| I120: DISKLIB-CTK   : Renaming: /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp -> /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk

2015-09-20T07:02:51.851Z| vcpu-0| I120: DISKLIB-CTK   : Attempting unlink of /vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-ctk.vmdk-tmp

2015-09-20T07:02:51.853Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 1843596.

2015-09-20T07:02:51.853Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-20T07:02:51.861Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.861Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-flat.vmdk" : closed.

2015-09-20T07:02:51.861Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.862Z| vcpu-0| A115: ConfigDB: Setting scsi0:0.fileName = "es-1.company.local.vmdk"

2015-09-20T07:02:51.875Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : open successful (1041) size = 16912384, hd = 0. Type 8

2015-09-20T07:02:51.875Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-20T07:02:51.885Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/es-1.company.local-000002-delta.vmdk" : closed.

2015-09-20T07:02:51.890Z| vcpu-0| A115: ConfigDB: Setting displayName = "es-1.company.local"

2015-09-20T07:02:51.897Z| vcpu-0| I120: SNAPSHOT: SnapshotDiskTreeFind: Detected node change from 'scsi0:0' to ''.

2015-09-20T07:02:51Z[+0.000]| vcpu-0| W110: Caught signal 11 -- tid 35868 (addr 98)

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: rip 0x18e79357 rsp 0x3fffb14f910 rbp 0x3fffb14fa00

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: rax 0x3236cc40 rbx 0x32356140 rcx 0x50 rdx 0x32356140 rsi 0x3236cc40 rdi 0x325ec4a0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120:         r8 0x3fffb14f66b r9 0x6f72662065676e61 r10 0x0 r11 0x0 r12 0x3236cc40 r13 0x325ec4a0 r14 0x325e58b0 r15 0x0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F910 : 0x0000000000000000 0x0000000000000010

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F920 : 0x000003fffb14f9c0 0x0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F930 : 0x0000000000000000 0x0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F940 : 0x0000000000000000 0x00000000325a9ea0

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F950 : 0x000003ff00000000 0x0000000018c66278

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F960 : 0x000003fffb14f9a8 0x000003fffb14f9d8

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F970 : 0x00000000325e58b0 0x0000000032596930

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SIGNAL: stack 3FFFB14F980 : 0x000003fffb14f9c0 0x0000000018c66534

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace:

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[0] 000003fffb14f430 rip=0000000018e934fe rbx=0000000018e92cd0 rbp=000003fffb14f450 r12=0000000000000000 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[1] 000003fffb14f460 rip=000000001899770c rbx=000000000000000b rbp=000003fffb14f630 r12=0000000000000003 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Backtrace[2] 000003fffb14f640 rip=000000000036c00f rbx=0000000032356140 rbp=000003fffb14f880 r12=000003fffb14f6c0 r13=00000000325ec4a0 r14=00000000325e58b0 r15=0000000000000000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[0] 000003fffb14f430 rip=0000000018e934fe in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[1] 000003fffb14f460 rip=000000001899770c in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: SymBacktrace[2] 000003fffb14f640 rip=000000000036c00f

2015-09-20T07:02:51Z[+0.000]| vcpu-0| I120: Unexpected signal: 11.

2015-09-20T07:02:51Z[+3.088]| vcpu-0| W110: A core file is available in "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmx-zdump.000"

2015-09-20T07:02:51Z[+3.088]| vcpu-0| W110: Writing monitor corefile "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmmcores.gz"

2015-09-20T07:02:51Z[+3.093]| vcpu-0| I120: Counting amount of anonymous memory

2015-09-20T07:02:51Z[+3.106]| vcpu-0| I120: Total Count of Anon Pages and CR3 pages 20226

2015-09-20T07:02:51Z[+3.113]| vcpu-0| W110: Dumping core for vcpu-0

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: VMK Stack for vcpu 0 is at 0x4123af755000

2015-09-20T07:02:51Z[+3.113]| vcpu-0| I120: Beginning monitor coredump

2015-09-20T07:02:51Z[+3.949]| vcpu-0| I120: End monitor coredump

2015-09-20T07:02:51Z[+3.949]| vcpu-0| W110: Dumping core for vcpu-1

2015-09-20T07:02:51Z[+3.949]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-20T07:02:51Z[+3.950]| vcpu-0| I120: VMK Stack for vcpu 1 is at 0x4123afc15000

2015-09-20T07:02:51Z[+3.950]| vcpu-0| I120: Beginning monitor coredump

2015-09-20T07:02:51Z[+4.756]| vcpu-0| I120: End monitor coredump

2015-09-20T07:02:51Z[+4.756]| vcpu-0| W110: Dumping extended monitor data

2015-09-20T07:02:51Z[+9.169]| vcpu-0| I120: CoreDump: ei->size 133267456 : len = 133267456

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace:

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[0] 000003fffb14ef30 rip=0000000018e934fe rbx=0000000018e92cd0 rbp=000003fffb14ef50 r12=0000000000000000 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[1] 000003fffb14ef60 rip=00000000188b57c5 rbx=00000000198a98a8 rbp=000003fffb14f450 r12=0000000000000001 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[2] 000003fffb14f460 rip=0000000018997766 rbx=000000000000000b rbp=000003fffb14f630 r12=0000000000000003 r13=000003fffb150680 r14=000003fffb14f990 r15=000000000000000b

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: Backtrace[3] 000003fffb14f640 rip=000000000036c00f rbx=0000000032356140 rbp=000003fffb14f880 r12=000003fffb14f6c0 r13=00000000325ec4a0 r14=00000000325e58b0 r15=0000000000000000

2015-09-20T07:02:51Z[+9.172]| vcpu-0| I120: SymBacktrace[0] 000003fffb14ef30 rip=0000000018e934fe in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[1] 000003fffb14ef60 rip=00000000188b57c5 in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[2] 000003fffb14f460 rip=0000000018997766 in function (null) in object /bin/vmx loaded at 000000001873f000

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: SymBacktrace[3] 000003fffb14f640 rip=000000000036c00f

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: Msg_Post: Error

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120+ Unexpected signal: 11.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/526bda53-1f2b17a6-2ebf-001b21a44f80/Virtual Machines/Production/es-1.company.local/vmware.log".

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.requestSupport.withoutLog] You can request support.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.requestSupport.vmSupport.vmx86]

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120+ To collect data to submit to VMware technical support, run "vm-support".

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: [msg.panic.response] We will respond on the basis of your support entitlement.

2015-09-20T07:02:51Z[+9.173]| vcpu-0| I120: ----------------------------------------

2015-09-20T07:02:51Z[+9.179]| vcpu-0| I120: Exiting

Tags (1)
57 Replies
Enthusiast
Enthusiast

I have hundreds of VMs backed up by Avamar using VM snapshots but have only seen two VMs crash similar to this issue during snapshot removal. Both on same host, 5.5.0 3029944 (Update 3). However they do not have the "SNAPSHOT: SnapshotDiskTreeFind: Detected node change" line right before the crash as in the KB article. I have opened a case and am considering rolling back Update 3 on one host and isolating affected VMs there.

2015-09-29T01:59:18.274Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 2, numSubChains = 1

2015-09-29T01:59:18.274Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(0) fid = 46754321, extentType = 2

2015-09-29T01:59:18.274Z| vcpu-0| I120: DISKLIB-CHAINESX : ChainESXOpenSubChain:(1) fid = 52783643, extentType = 0

2015-09-29T01:59:18.275Z| vcpu-0| I120: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 52783643.

2015-09-29T01:59:18.275Z| vcpu-0| I120: DISKLIB-CBT   : Successfuly created cbt node 3166a1c-cbt.

2015-09-29T01:59:18.275Z| vcpu-0| I120: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/3166a1c-cbt

2015-09-29T01:59:18.645Z| vcpu-0| I120: DISKLIB-LIB   : Upward Combine 2 links at 0. Need 0 MB of free space (4098338 MB available)

2015-09-29T01:59:18.969Z| vcpu-0| I120: DISKLIB-CTK   : End Combine

2015-09-29T01:59:19.017Z| vcpu-0| I120: DISKLIB-CTK   : Unlinked /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk, tmp file: /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk-tmp

2015-09-29T01:59:19.462Z| vcpu-0| I120: DISKLIB-CTK   : resuming /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk-tmp

2015-09-29T01:59:19.469Z| vcpu-0| I120: DISKLIB-CTK   : Renaming: /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk-tmp -> /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk

2015-09-29T01:59:19.478Z| vcpu-0| I120: DISKLIB-CTK   : Attempting unlink of /vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-ctk.vmdk-tmp

2015-09-29T01:59:19.482Z| vcpu-0| I120: DISKLIB-CBT   : Shutting down change tracking for untracked fid 52783643.

2015-09-29T01:59:19.482Z| vcpu-0| I120: DISKLIB-CBT   : Successfully disconnected CBT node.

2015-09-29T01:59:19.510Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-000002-delta.vmdk" : closed.

2015-09-29T01:59:19.511Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-flat.vmdk" : closed.

2015-09-29T01:59:19.513Z| vcpu-0| A115: ConfigDB: Setting displayName = "VM_NAME"

2015-09-29T01:59:19.514Z| vcpu-0| A115: ConfigDB: Setting scsi0:4.fileName = "VM_NAME_7.vmdk"

2015-09-29T01:59:19.570Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-000002-delta.vmdk" : open successful (1041) size = 413696, hd = 0. Type 8

2015-09-29T01:59:19.571Z| vcpu-0| I120: DISKLIB-LIB   : Resuming change tracking.

2015-09-29T01:59:19.645Z| vcpu-0| I120: DISKLIB-VMFS  : "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/VM_NAME_7-000002-delta.vmdk" : closed.

2015-09-29T01:59:19.666Z| vcpu-0| A115: ConfigDB: Setting displayName = "VM_NAME"

2015-09-29T01:59:19Z[+0.041]| vcpu-0| W110: Caught signal 11 -- tid 39998 (addr A6289C6)

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: rip 0xa6289c6 rsp 0x3ffd015d9c8 rbp 0x3ffd015da00

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: rax 0x2f rbx 0x323ee7c0 rcx 0x0 rdx 0x1 rsi 0x323ee7c0 rdi 0x442042445520726f

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120:         r8 0x0 r9 0x10 r10 0x0 r11 0xa5a1656 r12 0x442042445520726f r13 0x322eefe0 r14 0x32525ea0 r15 0x3238b740

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015D9C8 : 0x000000000937be2f 0x0000000000000000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015D9D8 : 0x00000000322eefe0 0x0000000000010101

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015D9E8 : 0x00000000323bde60 0x0000000000000000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015D9F8 : 0x00000000323ea320 0x000003ffd015db00

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015DA08 : 0x0000000009365747 0x000003ff00000000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015DA18 : 0x0000000000000000 0x000003ff00000000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015DA28 : 0x0000000000000000 0x000003ffd015db00

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SIGNAL: stack 3FFD015DA38 : 0x000000000935aca7 0x0000000000000000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: Backtrace:

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: Backtrace[0] 000003ffd015d4e0 rip=00000000093974fe rbx=0000000009396cd0 rbp=000003ffd015d500 r12=0000000000000000 r13=000003ffd015e680 r14=000003ffd015da48 r15=000000000000000b

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: Backtrace[1] 000003ffd015d510 rip=0000000008e9b70c rbx=000000000000000b rbp=000003ffd015d6e0 r12=0000000000000003 r13=000003ffd015e680 r14=000003ffd015da48 r15=000000000000000b

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: Backtrace[2] 000003ffd015d6f0 rip=00000000003f300f rbx=00000000323ee7c0 rbp=000003ffd015d930 r12=000003ffd015d770 r13=00000000322eefe0 r14=0000000032525ea0 r15=000000003238b740

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SymBacktrace[0] 000003ffd015d4e0 rip=00000000093974fe in function (null) in object /bin/vmx loaded at 0000000008c43000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SymBacktrace[1] 000003ffd015d510 rip=0000000008e9b70c in function (null) in object /bin/vmx loaded at 0000000008c43000

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: SymBacktrace[2] 000003ffd015d6f0 rip=00000000003f300f

2015-09-29T01:59:19Z[+0.041]| vcpu-0| I120: Unexpected signal: 11.

2015-09-29T01:59:19Z[+10.112]| vcpu-0| W110: A core file is available in "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/vmx-zdump.000"

2015-09-29T01:59:19Z[+10.113]| vcpu-0| W110: Writing monitor corefile "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/vmmcores.gz"

2015-09-29T01:59:19Z[+10.136]| vcpu-0| I120: Counting amount of anonymous memory

2015-09-29T01:59:19Z[+10.141]| vcpu-0| I120: Total Count of Anon Pages and CR3 pages 6736

2015-09-29T01:59:19Z[+10.144]| vcpu-0| W110: Dumping core for vcpu-0

2015-09-29T01:59:19Z[+10.144]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-29T01:59:19Z[+10.144]| vcpu-0| I120: VMK Stack for vcpu 0 is at 0x412463495000

2015-09-29T01:59:19Z[+10.144]| vcpu-0| I120: Beginning monitor coredump

2015-09-29T01:59:19Z[+11.087]| vcpu-0| I120: End monitor coredump

2015-09-29T01:59:19Z[+11.088]| vcpu-0| W110: Dumping core for vcpu-1

2015-09-29T01:59:19Z[+11.088]| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2015-09-29T01:59:19Z[+11.088]| vcpu-0| I120: VMK Stack for vcpu 1 is at 0x412463515000

2015-09-29T01:59:19Z[+11.088]| vcpu-0| I120: Beginning monitor coredump

2015-09-29T01:59:19Z[+12.016]| vcpu-0| I120: End monitor coredump

2015-09-29T01:59:19Z[+12.017]| vcpu-0| W110: Dumping extended monitor data

2015-09-29T01:59:19Z[+14.098]| vcpu-0| I120: CoreDump: ei->size 44404736 : len = 44404736

2015-09-29T01:59:19Z[+14.121]| vcpu-0| I120: Backtrace:

2015-09-29T01:59:19Z[+14.121]| vcpu-0| I120: Backtrace[0] 000003ffd015cfe0 rip=00000000093974fe rbx=0000000009396cd0 rbp=000003ffd015d000 r12=0000000000000000 r13=000003ffd015e680 r14=000003ffd015da48 r15=000000000000000b

2015-09-29T01:59:19Z[+14.121]| vcpu-0| I120: Backtrace[1] 000003ffd015d010 rip=0000000008db97c5 rbx=0000000009dad8a8 rbp=000003ffd015d500 r12=0000000000000001 r13=000003ffd015e680 r14=000003ffd015da48 r15=000000000000000b

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: Backtrace[2] 000003ffd015d510 rip=0000000008e9b766 rbx=000000000000000b rbp=000003ffd015d6e0 r12=0000000000000003 r13=000003ffd015e680 r14=000003ffd015da48 r15=000000000000000b

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: Backtrace[3] 000003ffd015d6f0 rip=00000000003f300f rbx=00000000323ee7c0 rbp=000003ffd015d930 r12=000003ffd015d770 r13=00000000322eefe0 r14=0000000032525ea0 r15=000000003238b740

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: SymBacktrace[0] 000003ffd015cfe0 rip=00000000093974fe in function (null) in object /bin/vmx loaded at 0000000008c43000

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: SymBacktrace[1] 000003ffd015d010 rip=0000000008db97c5 in function (null) in object /bin/vmx loaded at 0000000008c43000

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: SymBacktrace[2] 000003ffd015d510 rip=0000000008e9b766 in function (null) in object /bin/vmx loaded at 0000000008c43000

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: SymBacktrace[3] 000003ffd015d6f0 rip=00000000003f300f

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: Msg_Post: Error

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120+ Unexpected signal: 11.

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/550c3eba-5d4ab9be-eca3-5cf3fcdb053c/VM_NAME/vmware.log".

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: [msg.panic.requestSupport.withoutLog] You can request support.

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: [msg.panic.requestSupport.vmSupport.vmx86]

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120+ To collect data to submit to VMware technical support, run "vm-support".

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: [msg.panic.response] We will respond on the basis of your support entitlement.

2015-09-29T01:59:19Z[+14.122]| vcpu-0| I120: ----------------------------------------

2015-09-29T01:59:19Z[+14.133]| vcpu-0| I120: Exiting

0 Kudos
Contributor
Contributor

Rick,

honestly, I am not sure if this really is helping at this point any more. This situation could have been avoided and I have to blame VMware for this.

The VSS issue described in  KB2115997 has been around since March and the fix was available at least in early July. For some reason it was decided against releasing an EP but rather including the fix in U3. Support told me, this was scheduled for August and was pushed back over and over. I never got a definitive answer whether this issue was only caused by VMware Tools or also affected the hypervisor. There is always the risk of introducing more issues by upgrading VMware Tools out of order, so we decided to wait for a complete solution and live with random VM crashes.

When U3 finally was released many people rushed to the upgrade to fix this problem. The outcome is, we now had more VMs crash every night due to a different bug. It is beyond me, why it took over a week to release a KB and the corresponding alert. In my opinion this should have happened the minute the PR was opened.

There is a certain irony, that the KB was published yesterday shortly after sending out VMSA-2015-0007. That VMSA describes a remote code execution vulnerability in ESXi, basically the worst thing that can happen. The fix for 5.5 is upgrading to U3... Yes, you can only install security fixes and keep away from the other fixes, but there is no documentation which patch actually introduced the snapshot bug. For all I know, the patch fixing OpenSLPD also breaks the snapshot behavior.

From a technical perspective, this is a mess, but situations like this happen and are part of our job description. What I don't get, is how you communicate during these situations to your customers. I am pretty sure VMware is listening, but providing answers is not always VMware's strong suit. Whether it is handling the initial release of SSO in 5.1, messing up certificate management in 5.1 and 5.5, heartbleed or that whole Web Client story, at least acknowledging a problem exists before it blows up, might already help and maintain your customers' trust in VMware.

Regards / Grüße,

Thorsten

Hot Shot
Hot Shot

I'm not sure that the solution to the security flaw is to update to ESXi 5.5 U3. The advisory states that the flaw affects "VMware ESXi 5.5 without patch ESXi550-201509101", and its KB‌ states that the build number of the security patch is "VMware:esx-base:5.5.0-2.65.3029837".

This is a lower build number than 5.5 U3, which is 3029944.

There might be a middle ground here, where we can patch the security bug without getting hit by the snapshot consolidation bug.

Has anyone tested this?

0 Kudos
Enthusiast
Enthusiast

This Update 3 flaw is a major FAIL by VMware.  Inexcusable!! 

0 Kudos
Hot Shot
Hot Shot

To think that it took a year between Update 2 and Update 3, and then this..

0 Kudos
Hot Shot
Hot Shot

Yeah, we should all switch to that piece of software that doesn't have any bugs.

Oh, wait.. There is none. Correct me if I'm wrong.

0 Kudos
Contributor
Contributor

For what it's worth, I have 2 clusters running 5.5 U3 right now and (knock on wood) I have not had the issue or been able to produce the issue.  We have CommVault running daily in the environment.  I ran a bunch of tests yesterday to produce the issue and see where my environment stands and the results are the following:

On a test VM:

Test 1 – Normal snapshot with out of date VMWare Tools – No crash

Test 2 – Quiesced snapshot with out of date VMWare Tools – No crash

Test 3 – Normal snapshot with updated VMWare Tools – No crash

Test 4 – Quiesced snapshot with updated VMWare Tools – No crash


My VM's live in the root of their respective datastores which seems to be a mitigating factor in some way (not always according to this thread).  I also spot checked a few of my guest's vmware.log files greping for some key phrases in the KB that was released and everything came back clean.  I heard a lot of buzz about the utilization of Shadow Copy playing a factor which is why I ran tests with older VMWare Tools and the latest VMWare Tools.


I honestly don't know for sure what is preventing me from being susceptible, I just figured I would post my tests and findings.



Contributor
Contributor

I have faced same kind of errors in our environment

Please refer the KB published from Vmware.

VMware KB: Snapshot consolidation causes virtual machines running on VMware ESXi 5.5 Update 3 hosts ...

Cause

This issues occurs due to a segmentation fault when changing the snapshot tree data-structure. 

Resolution

This is a known issue affecting VMware ESXi 5.5 Update 3 hosts.

Currently, there is no resolution.

To work around this issue preform one of these options:

  • Downgrade the ESXi host(s) to a previous version. For more information, see Reverting to a previous version of ESXi (1033604).
  • Consolidate the snapshot(s) with the virtual machine powered off.
  • For larger environments where reverting all hosts is not feasible:
    1. Revert a subset of the ESXi hosts to a previous version. For more information, see Reverting to a previous version of ESXi (1033604).
    2. Migrate critical virtual machines to the downgraded hosts.
    3. Create DRS Affinity rules to keep the virtual machines from migrating to the affected ESXi hosts. For more information see the Using DRS Affinity Rules section of the vSphere Resource Management guide.
    4. Disable non-critical virtual machines from any backup solutions.

A major issue for the environment, where they Snapshot Based Backups.

I dont recommend for the environment with VDP/Veem/TSM4VE. 

0 Kudos
Hot Shot
Hot Shot

‌Regarding the mention of TSM, it works without issue if you are using storage-level snapshots and not VMware snapshots. 

0 Kudos
Enthusiast
Enthusiast

Calling this a "bug" is an insult to insects everywhere.  This is a catastrophic failure of epic proportions.

0 Kudos
Contributor
Contributor

We are at the start of upgrading / reinstalling our hosts and 1 has been outfitted with the VMware-ESXi-5.5.0-Update3-3029944-HP-550.9.3.26-Sep2015.iso passed thursday.

We use Veeam as backup program and i havent seen 1 vm crash on that host as mentioned by several in this post.

We use CBT.

Storage Lefthand and VSA.

0 Kudos
Contributor
Contributor

For what it's worth,

we're running 6 hosts ESXi 5.5 update 3 (3029944) since the updates became available.

100 VM's, backed nightly using Veeam & CBT. Mostly 2008R2 and 2012R2-servers.

VM's stored in separate folders on datastores.

Storage on HP Lefthand / iSCSI.

If it is of any relevance, our vCenter is rebooted weekly.

No issues what so ever with snapshots or backup so far.

0 Kudos
Hot Shot
Hot Shot

Rick,

Why on earth is VMware not informing its customers of this issue directly? I stumbled on it in somebody's blog site. Communication is the key!

Ron

0 Kudos
Expert
Expert

I manually removed snap from a VM & then did consolidation no issues found.

5.5 u3 & linux 6.7 vm

0 Kudos
Enthusiast
Enthusiast

I have hundreds of VMs backed up nightly by Avamar and only see it on the same 8-10 guests each night. However, there is nothing different about those guests compared to others. Definitely not every single VM affected.

0 Kudos
Hot Shot
Hot Shot

ESXi 5.5 Update 3a has just been released with a fix for the snapshot bug.

http://kb.vmware.com/kb/2133825

VMware KB: VMware ESXi 5.5, Patch Release ESXi550-201510001

Enthusiast
Enthusiast

Known Issue :

ESXi550-201510401-BG (ESXi5.5 Update 3a)  is released to address Snapshot issue of Update 3.

"This patch updates the esx-base VIB to resolve an issue where snapshot consolidation or deletion results in the virtual machines running on VMware ESXi 5.5 Update 3 hosts to fail with the error: Unexpected signal: 11 "

I would suggest Install said patch and upgrade to UPDATE 3a.

Regards

Pankaj Sharma

0 Kudos
Contributor
Contributor

VMware has released an update to fix this issue.

VMware KB: VMware ESXi 5.5, Patch ESXi550-201510401-BG: Updates esx-base

0 Kudos