VMware Cloud Community
nonfatalexec
Contributor
Contributor

Investigating hung VM

I'm running an Ubuntu 18 VM that hangs quite frequently: about once a week. I these instructions (Arcserve Support​, VMware Knowledge Base ) to create and convert the suspended hung VM to a memory dump vmss.core. I'm running kernel 4.15.0-88 in the VM, so I installed linux-image-4.15.0-88-generic-dbgsym to get the debugging symbols. I'm looking for advice on how to proceed past the below error so I can further investigate the hanging issue. Thanks!

$ crash vmss.core /usr/lib/debug/boot/vmlinux-4.15.0-88-generic

crash 7.2.1

  crash: /usr/lib/debug/boot/vmlinux-4.15.0-88-generic and vmss do not match!

I'm not sure if this is relevant:

$ readelf -a vmss.core

ELF Header:

  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00

  Class:                             ELF64

  Data:                              2's complement, little endian

  Version:                           1 (current)

  OS/ABI:                            UNIX - System V

  ABI Version:                       0

  Type:                              CORE (Core file)

  Machine:                           Advanced Micro Devices X86-64

  Version:                           0x1

  Entry point address:               0x0

  Start of program headers:          64 (bytes into file)

  Start of section headers:          0 (bytes into file)

  Flags:                             0x0

  Size of this header:               64 (bytes)

  Size of program headers:           56 (bytes)

  Number of program headers:         3

  Size of section headers:           0 (bytes)

  Number of section headers:         0

  Section header string table index: 0

There are no sections in this file.

There are no sections to group in this file.

Program Headers:

  Type           Offset             VirtAddr           PhysAddr

                 FileSiz            MemSiz              Flags  Align

  NOTE           0x00000000000000e8 0x0000000000000000 0x0000000000000000

                 0x0000000000000218 0x0000000000000000         0x0

  LOAD           0x0000000000001000 0x0000000000000000 0x0000000000000000

                 0x00000000c0000000 0x00000000c0000000  RWE    0x1000

  LOAD           0x00000000c0001000 0x0000000000000000 0x0000000100000000

                 0x0000000340000000 0x0000000340000000  RWE    0x1000

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.

Displaying notes found at file offset 0x000000e8 with length 0x00000218:

  Owner                 Data size Description

  CORE                 0x00000150 NT_PRSTATUS (prstatus structure)

  CORE                 0x00000088 NT_PRPSINFO (prpsinfo structure)

  CORE                 0x00000010 NT_TASKSTRUCT (task structure)

4 Replies
continuum
Immortal
Immortal

Have you found nothing useful in the vmware.log of the affected VM and in vmkernel,log ?

Trying to fix issues with the troubleshooting tools could easily take you on a completely unrelated track.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
nonfatalexec
Contributor
Contributor

I saw these about the time when the hang occurred. The VM's datastore UUID is 5e42b030-36babddb-5e13-0cc47aa5eba8. I followed VMware Knowledge Base because of the "CPU has been disabled" message.

vmware.log

2020-02-24T16:45:31.308Z| vcpu-0| I125: Vix: [4638949 vmxCommands.c:7212]: VMAutomation_HandleCLIHLTEvent. Do nothing.

2020-02-24T16:45:31.308Z| vcpu-0| I125: MsgHint: msg.monitorevent.halt

2020-02-24T16:45:31.308Z| vcpu-0| I125+ The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.

2020-02-24T16:45:31.308Z| vcpu-0| I125+ ---------------------------------------

2020-02-24T16:45:39.652Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2020-02-24T16:45:52.756Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2020-02-24T16:45:52.756Z| vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.

2020-02-24T16:45:52.756Z| vcpu-0| I125: Tools: Changing running status: 1 => 0.

vmkernel.log

2020-02-24T16:30:55.618Z cpu21:66366)ScsiDeviceIO: 2948: Cmd(0x43950ac96bc0) 0x1a, CmdSN 0x58159a from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T16:45:48.403Z cpu15:66366)ScsiDeviceIO: 2948: Cmd(0x43950c7b40c0) 0x1a, CmdSN 0x581604 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T16:55:18.947Z cpu47:66367)ScsiDeviceIO: 2948: Cmd(0x439d007f8880) 0x1a, CmdSN 0x5816a6 from world 0 to dev "naa.600062b2011e3c4022feff63a7aec57f" failed H:0x

2020-02-24T16:55:18.948Z cpu47:66367)ScsiDeviceIO: 2948: Cmd(0x439d007f8880) 0x1a, CmdSN 0x5816b0 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T16:55:18.967Z cpu47:66367)ScsiDeviceIO: 2948: Cmd(0x439d007f8880) 0x1a, CmdSN 0x5816b6 from world 0 to dev "naa.600062b2011e3c40223991fa07d724e9" failed H:0x

2020-02-24T16:55:18.972Z cpu47:66367)ScsiDeviceIO: 2948: Cmd(0x439d007f8880) 0x1a, CmdSN 0x5816c6 from world 0 to dev "naa.600062b2011e3c402567bbd73caac7f5" failed H:0x

2020-02-24T17:05:55.874Z cpu1:66366)ScsiDeviceIO: 2948: Cmd(0x439500a94dc0) 0x1a, CmdSN 0x581729 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x0

2020-02-24T17:15:55.935Z cpu35:66367)ScsiDeviceIO: 2948: Cmd(0x439d05ab63c0) 0x1a, CmdSN 0x581791 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T17:25:18.982Z cpu40:66367)ScsiDeviceIO: 2948: Cmd(0x439d04431080) 0x4d, CmdSN 0x777dd from world 67953 to dev "naa.600062b2011e3c4022feff7da93c2699" failed H

2020-02-24T17:25:18.985Z cpu40:66367)ScsiDeviceIO: 2948: Cmd(0x439d04431080) 0x85, CmdSN 0x777e0 from world 67953 to dev "naa.600062b2011e3c4025d433264c2da236" failed H

2020-02-24T17:25:55.949Z cpu41:66367)ScsiDeviceIO: 2948: Cmd(0x439d04507d80) 0x1a, CmdSN 0x58184c from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T17:40:48.496Z cpu9:66366)ScsiDeviceIO: 2948: Cmd(0x43950c6399c0) 0x1a, CmdSN 0x5818b6 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x0

2020-02-24T17:45:02.932Z cpu55:66207)lsi_mr3: megasas_hotplug_work:258: event code: 0x9f.                                                                              

2020-02-24T17:55:19.076Z cpu46:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a2e9c0) 0x4d, CmdSN 0x7780d from world 67953 to dev "naa.600062b2011e3c40259ccb27dd96f127" failed H

2020-02-24T17:55:19.085Z cpu46:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a2e9c0) 0x1a, CmdSN 0x58194f from world 0 to dev "naa.600062b2011e3c4022399218099cc5c5" failed H:0x

2020-02-24T17:55:19.089Z cpu46:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a2e9c0) 0x1a, CmdSN 0x58195f from world 0 to dev "naa.600062b2011e3c4023e49e29915f8bec" failed H:0x

2020-02-24T17:55:19.094Z cpu46:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a2e9c0) 0x1a, CmdSN 0x58196f from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T18:03:14.422Z cpu6:66366)ScsiDeviceIO: 2948: Cmd(0x43950ade1440) 0x1a, CmdSN 0x5819d9 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x0

2020-02-24T18:18:03.583Z cpu16:66366)ScsiDeviceIO: 2948: Cmd(0x43950adce040) 0x1a, CmdSN 0x581a43 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T18:25:19.136Z cpu52:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a72dc0) 0x1a, CmdSN 0x581a78 from world 0 to dev "naa.600062b2011e3c402449b614e14645d2" failed H:0x

2020-02-24T18:25:19.195Z cpu52:66367)ScsiDeviceIO: 2948: Cmd(0x439d05a72dc0) 0x4d, CmdSN 0x77849 from world 67953 to dev "naa.600062b2011e3c40223991fa07d724e9" failed H

2020-02-24T18:28:03.601Z cpu29:66367)ScsiDeviceIO: 2948: Cmd(0x439d065825c0) 0x1a, CmdSN 0x581afc from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T18:41:59.726Z cpu30:66367)ScsiDeviceIO: 2948: Cmd(0x439d0651ed40) 0x1a, CmdSN 0x581b66 from world 0 to dev "naa.600062b2011e3c40222b1996047ad1a9" failed H:0x

2020-02-24T18:50:35.603Z cpu16:68062 opID=4d25eb85)World: 12230: VC opID 2485eed7 maps to vmkernel opID 4d25eb85                                                       

2020-02-24T18:50:35.603Z cpu16:68062 opID=4d25eb85)FSS: 6214: Conflict between buffered and unbuffered open (file '<obfuscated>.vmdk'):flags 0x4008, requested fl

2020-02-24T18:52:49.734Z cpu23:68062 opID=17f85ff9)World: 12230: VC opID 2485ef0b maps to vmkernel opID 17f85ff9                                                       

2020-02-24T18:52:49.734Z cpu23:68062 opID=17f85ff9)FSS: 6214: Conflict between buffered and unbuffered open (file '<obfuscated>-0-flat.vmdk'):flags 0x4008, requested fl

turkadurka
Contributor
Contributor

I also have a VM that is hanging with the same error message in the vmware.log file. Right around the same time the hang's started occurring I discovered the kernel was upgraded to 4.15.0-88, which makes me believe the hangs are occurring due the upgrade.

Reply
0 Kudos
nonfatalexec
Contributor
Contributor

I have 3 VM's including this one that hangs once a week. Out of the other two: one never hangs, and another hangs about once a month. All 3 are running 4.15.0-88. The VM's that hang do much more network and IO activity. I'm not sure if 4.15.0-88 is causing these hangs.

Reply
0 Kudos