Lost access to volume … due to connectivity issues

Codeman1980 · ‎11-26-2016

Hello,

I have a brand new SYS-E200-8D Micro Server eqipped with 64 GB RAM and a 1 TB Samsung EVO 850 PRO SATA 6 SSD and installed ESXi 6.5 on it.

I have currently deployed one Windows Server 2016 RTM Virtual Machine on the SSD that acts as the Datastore.

But as soon as I do storage performance testing within the VM (I use CrystalDiskMark) I get the following warnings in the Monitor tab of the Datastore:

Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.	Saturday, November 26, 2016, 20:47:22 +0100	Warning
Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.	Saturday, November 26, 2016, 20:47:22 +0100	Warning
Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.	Saturday, November 26, 2016, 20:46:59 +0100	Warning
Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.	Saturday, November 26, 2016, 20:46:58 +0100	Warning
Successfully restored access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) following connectivity issues.	Saturday, November 26, 2016, 20:45:02 +0100	Warning
Lost access to volume 58399b3f-53265d09-9851-0cc47aca3b52 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.	Saturday, November 26, 2016, 20:45:01 +0100	Warning

CrystalDiskMark also reports very bad throughput for the C drive, which is physically stored through the vmdk file on the Datastore (which makes sense because of the above mentioned warnings).

Any ideas what could happen here and why I get these warnings?

I have also a Mac Mini setup (16 GB RAM, 256 GB Samsung SATA 6 SSD, ESXi 6.5 installation) where these warnings don't occur when I do storage performance tests.

Thanks for your help & input,

-Klaus

DavoudTeimouri · ‎11-26-2016

Actually, "Lost access to volume" has many cause but at first step, you should check your hardware.

It's recommended that check FC ports on SAN switch and HBA on server and also fiber cables in a SAN environment.

So, you can change your SATA port on mainboard for test.

Also read this KB for more information.

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/

D2B2 · ‎11-27-2016

Hi Codeman1980,

I had the same problem on ESXi 6.5.

I have 2 hard drives on the same server, 1 SSD and 1 HDD. With the SSD I had this error when I was writing on it but not with the HDD.

After dissabled the vmw_ahci driver, no more errors.

"esxcli system module set --enabled=false --module=vmw_ahci"

More info:

http://www.nxhut.com/2016/11/fix-slow-disk-performance-vmwahci.html

Regards,

Dan

Codeman1980 · ‎11-27-2016

Hello Dan,

Thanks for your answer.

I've tried to disable the module (and restarted the ESXi host), but the problem is still there.

Interestingly enough, the storage throughput is much lower as when the driver was enabled - so the opposite happened in my scenario as described in the mentioned blog posting.

I have also tried to install a Windows Server 2012 as a bare metal installation on the SYS-E200-8D to see if the problem is depend to ESXi.

Unfortunately Windows Server 2012 also reports some problems with the storage in the Event Viewer like the following:

*) Reset to device device raidport0 was issued

*) the io operation at logical block address for disk was retried

So it seems that the problem is not directly related to ESXi, but more related to the SYS-E200-8D Server itself.

I have no idea what could be wrong in my scenario, because I run everything with the default BIOS settings, and I have also already tried to use a different SSD disk - but I have here still the same problems.

I have also tried to attach the SATA SSD to different SATA ports (SATA 0, SATA 1), but this also didn't helped 😞

It shouldn't be the case that the SYS-E200-8D Server causes this problem?

Thanks!

-Klaus

D2B2 · ‎11-27-2016

Hi Klaus,

Looks like is a compatibility problem (drivers).

Things you can test:

- Use a HDD drive (different drivers depending if is a HDD or SSD).

Screen Shot 2016-11-27 at 15.30.44.png

- Try with a previous ESXi version.

Good luck.

Regards,

Dan

Bleeder · ‎02-03-2017

This still isn't fixed in the new ESXi 6.5.0a patch released this week

prdeepkumawat · ‎03-03-2017

I also facing similar issue. Could anyone help?

Lost access to volume 4c8ba981-473af21b-b02e-001a64b45292 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly

JoeyvdBerge · ‎05-10-2017

Hi D2B2,

Thank you for your solution. It works fine for me!

ashishsingh1508 · ‎08-28-2017

Hi Bleeder,

Please note its not a ESXi host bug which needs to be fixed, if you are receiving this error that means ESXi host is not receiving heartbeat in timely manner which is then reported as a warning you are seeing.

Ashish Singh VCP-6.5, VCP-NV 6, VCIX-6,VCIX-6.5, vCAP-DCV, vCAP-DCD

RajeevVCP4 · ‎08-29-2017

Did you check your fnic/hba firmware compatibility with 6.5 , it seems storage issue

refer this KB

Host Connectivity Degraded in ESX/ESXi (1009557) | VMware KB

Understanding lost access to volume messages in ESXi 5.5/6.x (2136081) | VMware KB

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you

dekoshal · ‎08-29-2017

You might want to start with checking the vmkernel.log, vobd.log and may be vmksummary.log as well to get more insight about what's happening behind the curtain to find cause of the connectivity issue.

If you found this or any other answer helpful, please consider the use of the Correct or Helpful to award points.

Best Regards,

Deepak Koshal

CNE|CLA|CWMA|VCP4|VCP5|CCAH

alphenit · ‎09-06-2020

This seems to be one of those timeless problems that keeps coming back across versions.

In my homelab errors popped up out of nowhere: Lost access to volume 5f48d732-cdc8d5db-14d9-0cc47ac9b978 (local-ssd) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

I did some checks with smartctl and it looked like of my ssd's was turning bad (some pending sectors) so I replaced the SSD with a new EVO 860 but to my surprise the errors kept on coming. I tried disabling the VMFS reclaim option which didn't help.

Was using this version of ESXi 6.7:

esxcli software vib list | grep ahci with build-16316930

sata-ahci 3.0-26vmw.670.0.0.8169922 VMW VMwareCertified 2019-06-08

vmw-ahci 1.2.8-1vmw.670.3.73.14320388 VMW VMwareCertified 2019-12-22

grepping the vmkernel log gave me hope reading through the 2020-08 6.7 patch release notes because I was seeing these errors:

vmkernel.log

[root@labhost:/vmfs/volumes/5a4894b1-dd26f0cc-1182-0cc47ac9b978/patches] cat /var/log/vmkernel.log | grep vmw_ahci

2020-09-02T06:20:13.255Z cpu6:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

2020-09-02T06:20:13.255Z cpu6:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T07:08:27.784Z cpu17:2107862)vmw_ahci[00000011]: HBAIntrHandler:new interrupts coming, PxIS = 0x8, no repeat

2020-09-02T12:50:26.982Z cpu17:2158172)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: ABORT sn=0x7475a5 initiator=0x430458b8b940

2020-09-02T12:50:26.982Z cpu17:2158172)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 27 BusyL: 27

2020-09-02T12:50:27.980Z cpu6:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ee9340

2020-09-02T12:50:27.980Z cpu6:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 27 BusyL: 27

2020-09-02T12:50:31.981Z cpu8:2158172)vmw_ahci[0000001f]: LogExceptionSignal:Port 0, Signal: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|--|--|--|--|--|--|--|PR|-- (0x0400)

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command b0, 00

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: LogExceptionProcess:Port 0, Process: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008)

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:AHCI_SIGNAL_ABORT_REQUEST signal.

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:Aborting command tag 4 from the Busy list

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430458b8b940 SN:0x7475a5 tag:4

2020-09-02T12:50:33.115Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:Abort scan took 6 (us) to complete, 0 commands aborted.

2020-09-02T12:50:33.116Z cpu9:2097972)vmw_ahci[0000001f]: LogExceptionSignal:Port 0, Signal: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: LogExceptionProcess:Port 0, Process: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008) Curr: --|--|--|AB|--|--|--|--|--|--|--|-- (0x0008)

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:AHCI_SIGNAL_ABORT_REQUEST signal.

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc8718ac8 tag:5

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc871bed0 tag:6

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ProcessAbortRequest:aborted command I:0x430858ee9340 SN:0xc8717d90 tag:25

2020-09-02T12:50:33.116Z cpu0:2097689)vmw_ahci[0000001f]: ExceptionHandlerWorld:Abort scan took 10 (us) to complete, 3 commands aborted.

2020-09-02T12:50:33.117Z cpu0:2097689)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 20 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 21 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 22 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 23 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 24 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 26 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: IssueCommand:tag: 27 already active during issue, reissue_flag:1

2020-09-02T12:50:33.125Z cpu0:2097689)vmw_ahci[0000001f]: ProcessActiveCommands:Commands completed: 0, re-issued: 7

2020-09-02T12:50:33.126Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ebc800

2020-09-02T12:50:33.126Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T12:50:33.128Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858eba200

2020-09-02T12:50:33.128Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T12:50:33.130Z cpu11:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

2020-09-02T12:50:33.130Z cpu11:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T12:50:33.132Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858ebc4c0

2020-09-02T12:50:33.132Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T12:50:33.134Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430858edbdc0

2020-09-02T12:50:33.134Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T12:50:33.136Z cpu9:2097972)vmw_ahci[0000001f]: scsiTaskMgmtCommand:VMK Task: VIRT_RESET initiator=0x430458b8b940

2020-09-02T12:50:33.136Z cpu9:2097972)vmw_ahci[0000001f]: ahciAbortIO:(curr) HWQD: 0 BusyL: 0

2020-09-02T14:07:30.319Z cpu15:2108092)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T14:07:51.959Z cpu13:2097204)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T14:08:16.168Z cpu0:2107827)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T14:16:53.668Z cpu16:2097207)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T14:17:00.628Z cpu2:2097193)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T14:17:07.235Z cpu15:2097206)vmw_ahci[0000001f]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x1, no repeat

2020-09-02T15:50:44.368Z cpu16:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command ec, 00

2020-09-02T23:21:35.836Z cpu0:2172046)vmw_ahci[00000011]: AHCI_EdgeIntrHandler:new interrupts coming, IS= 0x2, no repeat

2020-09-03T12:51:09.420Z cpu8:2097689)vmw_ahci[0000001f]: ExecInternalCommandPolled:FAIL!!: Internal command ec, 00

2020-09-03T14:32:24.865Z cpu7:2158172)vmw_ahci[00000011]: scsiTaskMgmtCommand:VMK Task: ABORT sn=0x652b initiator=0x430458b81bc0

2020-09-03T14:32:24.865Z cpu7:2158172)vmw_ahci[00000011]: ahciAbortIO:(curr) HWQD: 4 BusyL: 0

Since then I updated the host which updated the vmw_ahci driver to this version:

esxcli software vib list | grep ahci with build-16316930

esxcli software vib list | grep ahci with build-16713306

sata-ahci 3.0-26vmw.670.0.0.8169922 VMW VMwareCertified 2019-06-08

vmw-ahci 2.0.5-1vmw.670.3.116.16713306 VMW VMwareCertified 2020-09-03

The errrors have not come back since then ( *knocks on wood)

Not a lot information on this error on the net about this potentially could point to a vmware driver error so sharing my experience here

*Please consider awarding points if my response was helpful*

All

Lost access to volume … due to connectivity issues