5 Replies Latest reply on Mar 23, 2020 7:02 AM by IamTHEvilONE

    VMs Crashing on NFS Storage ... ESXi Kernel Log flodded with messages

    IamTHEvilONE Champion

      I've had a number of VMs fail to vMotion or straight up crash due to messages similar to these:

       

      Failed to process the set attributes result: Lost previously held disk lock

       

      2018-12-28T15:19:23.749Z| vcpu-0| I125: >>> Error writing log, 89 bytes discarded. Disk full?

      2018-12-28T15:19:23.749Z| vcpu-0| I125: Msg_Question:

      2018-12-28T15:19:23.749Z| vcpu-0| I125: [msg.hbacommon.locklost] The lock protecting 'vm.vmdk' has been lost, possibly due to underlying storage issues.

      2018-12-28T15:19:23.749Z| vcpu-0| I125+ If this virtual machine is configured to be highly available, ensure that the virtual machine is running on some other host before clicking OK.

      2018-12-28T15:19:23.749Z| vcpu-0| I125: ----------------------------------------

      2018-12-28T15:23:24.379Z| vcpu-0| I125: Timing out dialog 3174

      2018-12-28T15:23:24.380Z| vcpu-0| I125: MsgQuestion: msg.hbacommon.locklost reply=0

      2018-12-28T15:23:24.380Z| vcpu-0| E105: PANIC: Exiting because of failed disk operation.

      2018-12-28T15:23:25.575Z| vcpu-0| W115: A core file is available in "/vmfs/volumes/77955655-79af1105-0000-000000000000/vm_name/vm.zdump.000"

      2018-12-28T15:23:25.575Z| mks| W115: Panic in progress... ungrabbing

      2018-12-28T15:23:25.575Z| mks| I125: MKS: Release starting (Panic)

      2018-12-28T15:23:25.575Z| mks| I125: MKS: Release finished (Panic)

      2018-12-28T15:23:25.576Z| vcpu-0| I125: Writing monitor file `vmmcores.gz`

      2018-12-28T15:23:25.578Z| vcpu-0| W115: Dumping core for vcpu-0

      2018-12-28T15:23:25.578Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

      2018-12-28T15:23:25.578Z| vcpu-0| I125: VMK Stack for vcpu 0 is at 0x439337d13000

      2018-12-28T15:23:25.578Z| vcpu-0| I125: Beginning monitor coredump

      2018-12-28T15:23:26.171Z| vcpu-0| I125: End monitor coredump

      2018-12-28T15:23:26.172Z| vcpu-0| W115: Dumping core for vcpu-1

      2018-12-28T15:23:26.172Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

      2018-12-28T15:23:26.172Z| vcpu-0| I125: VMK Stack for vcpu 1 is at 0x439337f13000

      2018-12-28T15:23:26.172Z| vcpu-0| I125: Beginning monitor coredump

      2018-12-28T15:23:26.577Z| mks| W115: Panic in progress... ungrabbing

      2018-12-28T15:23:26.577Z| mks| I125: MKS: Release starting (Panic)

      2018-12-28T15:23:26.577Z| mks| I125: MKS: Release finished (Panic)

      2018-12-28T15:23:26.759Z| vcpu-0| I125: End monitor coredump

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Printing loaded objects

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9922257000-0x9923324914): /bin/vmx

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x996391E000-0x99639351CC): /lib64/libpthread.so.0

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9963B3B000-0x9963B3CF00): /lib64/libdl.so.2

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9963D3F000-0x9963D47D08): /lib64/librt.so.1

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9963F5A000-0x99641F24E4): /lib64/libcrypto.so.1.0.2

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9964423000-0x996448C4AC): /lib64/libssl.so.1.0.2

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9964697000-0x99647AB37C): /lib64/libX11.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99649B2000-0x99649C101C): /lib64/libXext.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9964BC2000-0x9964CA6341): /lib64/libstdc++.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9964EC5000-0x9964F4521C): /lib64/libm.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9965148000-0x996515CBC4): /lib64/libgcc_s.so.1

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x996535E000-0x99654BEC74): /lib64/libc.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99236FD000-0x992371A7D8): /lib64/ld-linux-x86-64.so.2

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99656C9000-0x99656E3634): /lib64/libxcb.so.1

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99658E5000-0x99658E695C): /lib64/libXau.so.6

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9965EB5000-0x9965F4A534): /usr/lib64/vmware/plugin/objLib/upitObjBE.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9966163000-0x99662B9994): /usr/lib64/vmware/plugin/objLib/vsanObjBE.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9966550000-0x9966564F94): /lib64/libz.so.1

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99669AF000-0x99669BA1D0): /lib64/libnss_files.so.2

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9966CC5000-0x9966CC60A4): /usr/lib/vmware/nas_plugins/lib64/libvmfsNasPlugin.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9966EC8000-0x9966EDB44C): /lib64/libvmkfslib.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99670DD000-0x996733A6D4): /lib64/libvmlibs.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x99675E1000-0x99678F91A4): /lib64/libvmsnapshot.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9967B13000-0x9967B20F14): /lib64/libtreestructs.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9967D23000-0x9967D245BC): /usr/lib/vmware/nas_plugins/lib64/libsvaNasPlugin.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: [0x9967F26000-0x9967F4236C): /usr/lib/vmware/nas_plugins/lib64/libnvdkTegileNasPlugin.so

      2018-12-28T15:23:27.220Z| vcpu-0| I125: End printing loaded objects

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace:

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[0] 0000009969aa6460 rip=00000099228d75e7 rbx=00000099228d70e0 rbp=0000009969aa6480 r12=0000000000000000 r13=0000000000000001 r14=0000009923ba1460 r15=0000009968371020

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[1] 0000009969aa6490 rip=000000992241766c rbx=0000009969aa64b0 rbp=0000009969aa6990 r12=000000992357c2b0 r13=0000000000000001 r14=0000009923ba1460 r15=0000009968371020

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[2] 0000009969aa69a0 rip=0000009922582d41 rbx=0000009923d68a40 rbp=0000009969aa69c0 r12=0000000000000002 r13=0000009923c9eb48 r14=0000009923ba1460 r15=0000009968371020

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[3] 0000009969aa69d0 rip=000000992257588d rbx=0000009923c9b030 rbp=0000009969aa6b70 r12=00000099233e3550 r13=0000000000000165 r14=0000009923ba1460 r15=0000009968371020

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[4] 0000009969aa6b80 rip=0000009922825431 rbx=00000099236904a8 rbp=0000009969aa6bb0 r12=00000099233e3550 r13=0000000000000165 r14=0000009923ba1460 r15=0000000000000000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[5] 0000009969aa6bc0 rip=000000992284d176 rbx=000000000000012d rbp=0000009969aa6c00 r12=000000992357c2b0 r13=0000009923687d80 r14=00000099235621a0 r15=0000000000000000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[6] 0000009969aa6c10 rip=0000009922825518 rbx=0000000000000000 rbp=0000009969aa6c20 r12=0000009969aa6c40 r13=0000009923dbb290 r14=0000009923df18d0 r15=0000000000000003

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[7] 0000009969aa6c30 rip=00000099228c2b05 rbx=0000009923691aa0 rbp=0000009969aa6d80 r12=0000009969aa6c40 r13=0000009923dbb290 r14=0000009923df18d0 r15=0000000000000003

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[8] 0000009969aa6d90 rip=0000009963925cfc rbx=0000000000000000 rbp=0000000000000000 r12=000003805f591a00 r13=0000009969aa79c0 r14=0000009923df18d0 r15=0000000000000003

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[9] 0000009969aa6ea0 rip=000000996542fead rbx=0000000000000000 rbp=0000000000000000 r12=000003805f591a00 r13=0000009969aa79c0 r14=0000009923df18d0 r15=0000000000000003

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Backtrace[10] 0000009969aa6ea8 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000 r12=000003805f591a00 r13=0000009969aa79c0 r14=0000009923df18d0 r15=0000000000000003

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[0] 0000009969aa6460 rip=00000099228d75e7 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[1] 0000009969aa6490 rip=000000992241766c in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[2] 0000009969aa69a0 rip=0000009922582d41 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[3] 0000009969aa69d0 rip=000000992257588d in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[4] 0000009969aa6b80 rip=0000009922825431 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[5] 0000009969aa6bc0 rip=000000992284d176 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[6] 0000009969aa6c10 rip=0000009922825518 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[7] 0000009969aa6c30 rip=00000099228c2b05 in function (null) in object /bin/vmx loaded at 0000009922257000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[8] 0000009969aa6d90 rip=0000009963925cfc in function (null) in object /lib64/libpthread.so.0 loaded at 000000996391e000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[9] 0000009969aa6ea0 rip=000000996542fead in function clone in object /lib64/libc.so.6 loaded at 000000996535e000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: SymBacktrace[10] 0000009969aa6ea8 rip=0000000000000000

      2018-12-28T15:23:27.220Z| vcpu-0| I125: Exiting

       

      Here is a sample of what's going on in vmkernel.log

      2018-12-28T15:19:23.751Z warning hostd[9A81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/77955655-79af1105-0000-000000000000/vm/vm.vmx] Failed to find activation record, event user unknown.

      2018-12-28T15:19:23.752Z verbose hostd[9A81B70] [Originator@6876 sub=PropertyProvider] RecordOp REMOVE: latestPage[33726], session[52610cea-e1e8-e07e-26d6-259c4517645b]52e66a9c-32b6-479e-319f-ca34434ae415. Applied change to temp map.

      2018-12-28T15:19:23.752Z verbose hostd[9A81B70] [Originator@6876 sub=PropertyProvider] RecordOp ADD: latestPage[33736], session[52610cea-e1e8-e07e-26d6-259c4517645b]52e66a9c-32b6-479e-319f-ca34434ae415. Applied change to temp map.

      2018-12-28T15:19:23.752Z verbose hostd[9A81B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: latestEvent, ha-eventmgr. Applied change to temp map.

      2018-12-28T15:19:23.752Z info hostd[9A81B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 33736 : Message on vm on esxi in ha-datacenter: The lock protecting 'vm.vmdk' has been lost, possibly due to underlying storage issues.

      --> If this virtual machine is configured to be highly available, ensure that the virtual machine is running on some other host before clicking OK.

      -->

       

      2018-12-28T16:00:05.215Z cpu57:1437367)WARNING: NFS41: NFS41FileOpSetFileAttributes:4540: Failed to process the set attributes result: Lost previously held disk lock

      2018-12-28T16:00:05.616Z cpu48:1437367)WARNING: NFS41: NFS41FileOpSetFileAttributes:4540: Failed to process the set attributes result: Lost previously held disk lock

      2018-12-28T16:00:06.019Z cpu48:1437367)WARNING: NFS41: NFS41FileOpSetFileAttributes:4540: Failed to process the set attributes result: Lost previously held disk lock

      2018-12-28T16:00:06.414Z cpu48:1437367)WARNING: NFS41: NFS41FileOpSetFileAttributes:4540: Failed to process the set attributes result: Lost previously held disk lock

      2018-12-28T16:00:06.814Z cpu48:1437367)WARNING: NFS41: NFS41FileOpSetFileAttributes:4540: Failed to process the set attributes result: Lost previously held disk lock

      2018-12-28T16:00:10.623Z cpu7:68509)WARNING: NFS41: NFS41_NFSAttrToVolatileFSAttrs:1641: Server FILES_FREE 5779000868 exceeds ESX limit

      2018-12-28T16:00:10.623Z cpu7:68509)WARNING: NFS41: NFS41_NFSAttrToVolatileFSAttrs:1654: Server FILES_TOTAL 5779005873 exceeds ESX limit

      2018-12-28T16:00:10.629Z cpu7:68509)WARNING: NFS41: NFS41_NFSAttrToVolatileFSAttrs:1641: Server FILES_FREE 5779000868 exceeds ESX limit

      2018-12-28T16:00:10.629Z cpu7:68509)WARNING: NFS41: NFS41_NFSAttrToVolatileFSAttrs:1654: Server FILES_TOTAL 5779005873 exceeds ESX limit

      2018-12-28T16:00:10.633Z cpu7:68509)WARNING: NFS41: NFS41_NFSAttrToVolatileFSAttrs:1641: Server FILES_FREE 5779000868 exceeds ESX limit

      2018-12-28T16:00:20.596Z cpu20:67009)SunRPC: 1116: Destroying world 0x1a0157

      2018-12-28T16:00:20.598Z cpu20:67009)SunRPC: 1116: Destroying world 0x1a018e

      2018-12-28T16:00:20.628Z cpu44:66542)WARNING: NFS: 221: Got error 13 from mount call

      2018-12-28T16:00:20.628Z cpu20:67009)SunRPC: 1116: Destroying world 0x1a018f

      2018-12-28T16:00:20.630Z cpu20:67009)SunRPC: 1116: Destroying world 0x1a0190

      2018-12-28T16:00:20.649Z cpu44:66542)WARNING: NFS: 221: Got error 13 from mount call

      2018-12-28T16:00:20.649Z cpu20:67009)SunRPC: 1116: Destroying world 0x1a0191

       

      vobd shows the VM crash happening

       

      It shows an APD on the NFS storage well before the crash itself, maybe about an hour before:

       

      2018-12-28T11:17:56.077Z: [APDCorrelator] 2034244476342us: [vob.storage.apd.start] Device or filesystem with identifier [77955655-79af1105-0000-000000000000] has entered the All Paths Down state.