VMware Cloud Community
ug294
Contributor
Contributor

Virtual machines become unresponsive and cannot be powered off (ESXi 6.7)

Hello, Guys.

I am facing an issue where sometimes my guest machine (Windows 10) becomes unresponsive and cannot be powered off.

The problematic machine cannot be shut down by either the GUI or the shell-command.

[~]$ esxcli vmprocess list
[~]$ esxcli vm process kill --type=force XXXX

When I checked the logs for that time, I noticed the following points.

  • "Turning off heartbeat checker" in hostd.log
  • "Issuing reset" in vmkwarning.log
  • "VSCSI: XXXX: handle 8193(vscsi0:1):Reset request on FSS handlele" in vmkernel.log

hostd.log

2021-05-13T11:17:01.338Z info hostd[2099735] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/5e8f6af8-ef17ef81-a40a-b4969140d48c/test-HOGE03/test-HOGE03.vmx] Turning off heartbeat checker
2021-05-13T11:17:44.334Z error hostd[2098983] [Originator@6876 sub=Default] IpmiIfcOpenIpmiOpen: open(/dev/ipmi0, RDWR) failed 2 m
2021-05-13T11:17:49.256Z error hostd[2220401] [Originator@6876 sub=Default] [LikewiseGetDomainJoinInfo:354] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

vmkwarning.log

2021-05-13T11:17:03.193Z cpu26:2099890)WARNING: VSCSI: 3510: handle 8193(vscsi0:1):WaitForCIF: Issuing reset;  number of CIF:2
2021-05-13T11:17:03.193Z cpu26:2099890)WARNING: VSCSI: 2657: handle 8193(vscsi0:1):Ignoring double reset

vmkernel.log

2021-05-13T11:07:40.680Z cpu3:2097896)DVFilter: 5963: Checking disconnected filters for timeouts
2021-05-13T11:17:00.191Z cpu26:2099890)VSCSI: 2623: handle 8193(vscsi0:1):Reset request on FSS handle 1968762 (1 outstanding commands) from (vmm0:test-HOGE03)
2021-05-13T11:17:00.191Z cpu0:2097544)VSCSI: 2903: handle 8193(vscsi0:1):Reset [Retries: 0/0] from (vmm0:test-HOGE03)
2021-05-13T11:17:00.191Z cpu0:2097544)vmw_ahci[00000b00]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430824ce0f80
2021-05-13T11:17:00.191Z cpu0:2097544)vmw_ahci[00000b00]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0
2021-05-13T11:17:00.193Z cpu0:2097544)vmw_ahci[00000b00]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430824ce0f80
2021-05-13T11:17:00.193Z cpu0:2097544)vmw_ahci[00000b00]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0
2021-05-13T11:17:00.195Z cpu0:2097544)vmw_ahci[00000c00]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430824ce0f80
2021-05-13T11:17:00.195Z cpu0:2097544)vmw_ahci[00000c00]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0
2021-05-13T11:17:03.193Z cpu26:2099890)WARNING: VSCSI: 3510: handle 8193(vscsi0:1):WaitForCIF: Issuing reset;  number of CIF:2
2021-05-13T11:17:03.193Z cpu26:2099890)WARNING: VSCSI: 2657: handle 8193(vscsi0:1):Ignoring double reset
2021-05-13T11:17:30.680Z cpu6:2097544)VSCSI: 2903: handle 8193(vscsi0:1):Reset [Retries: 1/0] from (vmm0:test-HOGE03)
2021-05-13T11:17:30.680Z cpu6:2097544)vmw_ahci[00000b00]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430824ce0f80
2021-05-13T11:17:30.680Z cpu6:2097544)vmw_ahci[00000b00]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0


It felt like the datastore was being reset.

That datastore is configured with VMFS6 on local storage. (SSD 1TB x 3)

Does anybody have a solution for this?

Thank you.

0 Kudos
3 Replies
Virt-aid
Enthusiast
Enthusiast

Hope these KB articles help: both articles hold similar symptoms.

https://kb.vmware.com/s/article/2152008

https://kb.vmware.com/s/article/2150962

 

0 Kudos
ug294
Contributor
Contributor

Thanks, Virt-aid.


https://kb.vmware.com/s/article/2152008

https://kb.vmware.com/s/article/2150962


These articles said it is already supported in ESXi 6.5(Patch 01).

So I am hoping that ESXi 6.7(which I have adopted) will not have this problem.

Or should I use ESXi 6.5 instead of 6.7?

Thanks.

0 Kudos
ug294
Contributor
Contributor

It's just my guess, but...is it possible that "Thin Provisioning" is the cause?

Guest "Windows 10" where the problem occurred uses "Thin Provisioning" vmdk for data-drive.

  • There are somoe vmdk-files in the datastore.
  • Each vmdk is only referenced by a single virtual machine, so simultaneous data access from multiple virtual machines will not occur.
  • "Provisioned Storage" is 1.5TB.
  • "Not-Shared Storage" and "Used Storage" is 1.3TB.

 

0 Kudos