Highlighted
Contributor
Contributor

Datastore not conecting to SSD

Hello,

 

A couple years ago i set up a ESXI host as a home lab, and a month ago or so it started giving me problems, the datastore, reporte the right SSD but cannot access de data in it.

I've been unable to fix it, then i tried to update the host to see if the updates would fix this, in case it was a fixed bug. When trying to update the host it tells me I dont have enough space to do so.

 

Here are the last lines of the log regarding the SSD thatrs giving me problems.

 

2020-11-15T20:27:14.357Z cpu0:2097275)vmw_ahci[0000001f]: CompletionBottomHalf:Error port=4, PxIS=0x40000001, PxTDF=0x461,PxSERR=0x00000000, PxCI=0x00000002, PxSACT=0x00000002, ActiveTags=0x00000002
2020-11-15T20:27:14.357Z cpu0:2097275)vmw_ahci[0000001f]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x0, lbc=0x22
2020-11-15T20:27:14.357Z cpu0:2097275)vmw_ahci[0000001f]: CompletionBottomHalf:cfis->command= 0x61
2020-11-15T20:27:14.357Z cpu0:2097275)vmw_ahci[0000001f]: LogExceptionSignal:Port 4, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2020-11-15T20:27:14.357Z cpu0:2097590)vmw_ahci[0000001f]: LogExceptionProcess:Port 4, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2020-11-15T20:27:14.357Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2020-11-15T20:27:14.357Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2020-11-15T20:27:14.369Z cpu0:2097590)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...
2020-11-15T20:27:14.372Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:fail a command on slot 1
2020-11-15T20:27:14.373Z cpu6:2097730)NMP: nmp_ThrottleLogForDevice:3689: Cmd 0x2a (0x459a409bac40, 2360224) to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" on path "vmhba0:C0:T4:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sens$
2020-11-15T20:27:14.373Z cpu6:2097730)ScsiDeviceIO: 3029: Cmd(0x459a409bac40) 0x2a, CmdSN 0x2 from world 2360224 to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2020-11-15T20:27:14.373Z cpu6:2097730)0x0.
2020-11-15T20:27:37.290Z cpu0:2099188 opID=dac7c17e)World: 11942: VC opID 9874db06 maps to vmkernel opID dac7c17e
2020-11-15T20:27:37.290Z cpu0:2099188 opID=dac7c17e)NVDManagement: 1478: No nvdimms found on the system
2020-11-15T20:27:44.746Z cpu6:2097185)vmw_ahci[0000001f]: CompletionBottomHalf:Error port=4, PxIS=0x40000001, PxTDF=0x461,PxSERR=0x00000000, PxCI=0x00000002, PxSACT=0x00000002, ActiveTags=0x00000002
2020-11-15T20:27:44.746Z cpu6:2097185)vmw_ahci[0000001f]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x1000, lbc=0x1
2020-11-15T20:27:44.746Z cpu6:2097185)vmw_ahci[0000001f]: CompletionBottomHalf:cfis->command= 0x61
2020-11-15T20:27:44.746Z cpu6:2097185)vmw_ahci[0000001f]: LogExceptionSignal:Port 4, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2020-11-15T20:27:44.746Z cpu0:2097590)vmw_ahci[0000001f]: LogExceptionProcess:Port 4, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2020-11-15T20:27:44.746Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2020-11-15T20:27:44.746Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2020-11-15T20:27:44.758Z cpu0:2097590)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...
2020-11-15T20:27:44.761Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:fail a command on slot 1
2020-11-15T20:27:44.761Z cpu6:2097185)ScsiDeviceIO: 3029: Cmd(0x459a4087f100) 0x2a, CmdSN 0x2 from world 2098532 to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2020-11-15T20:27:44.761Z cpu6:2097185)0x0.
2020-11-15T20:27:44.761Z cpu4:2098662 opID=97cbcde5)World: 11942: VC opID 9874db10 maps to vmkernel opID 97cbcde5
2020-11-15T20:27:44.761Z cpu4:2098662 opID=97cbcde5)LVM: 6546: Forcing APD unregistration of devID 5bf7fa17-29ceaf24-65e2-00fd45fdc780 in state 1.
2020-11-15T20:27:44.761Z cpu4:2098662 opID=97cbcde5)LVM: 15203: Failed to open device t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________:1 : I/O error
2020-11-15T20:27:50.855Z cpu6:2099188 opID=f0af3c41)World: 11942: VC opID 9874db2f maps to vmkernel opID f0af3c41
2020-11-15T20:27:50.855Z cpu6:2099188 opID=f0af3c41)vmw_ahci[0000001f]: CompletionBottomHalf:Error port=4, PxIS=0x40000001, PxTDF=0x461,PxSERR=0x00000000, PxCI=0x00000002, PxSACT=0x00000002, ActiveTags=0x00000002
2020-11-15T20:27:50.855Z cpu6:2099188 opID=f0af3c41)vmw_ahci[0000001f]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x1000, lbc=0x1
2020-11-15T20:27:50.855Z cpu6:2099188 opID=f0af3c41)vmw_ahci[0000001f]: CompletionBottomHalf:cfis->command= 0x61
2020-11-15T20:27:50.855Z cpu6:2099188 opID=f0af3c41)vmw_ahci[0000001f]: LogExceptionSignal:Port 4, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2020-11-15T20:27:50.855Z cpu0:2097590)vmw_ahci[0000001f]: LogExceptionProcess:Port 4, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2020-11-15T20:27:50.855Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2020-11-15T20:27:50.855Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2020-11-15T20:27:50.867Z cpu0:2097590)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...
2020-11-15T20:27:50.870Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:fail a command on slot 1
2020-11-15T20:27:50.870Z cpu6:2097185)ScsiDeviceIO: 3029: Cmd(0x459a40923200) 0x2a, CmdSN 0x2 from world 2098532 to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2020-11-15T20:27:50.870Z cpu6:2097185)0x0.
2020-11-15T20:27:50.870Z cpu6:2099188 opID=f0af3c41)LVM: 6546: Forcing APD unregistration of devID 5bf7fa17-29ceaf24-65e2-00fd45fdc780 in state 1.
2020-11-15T20:27:50.870Z cpu6:2099188 opID=f0af3c41)LVM: 15203: Failed to open device t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________:1 : I/O error
2020-11-15T20:28:04.353Z cpu1:2097629)NMP: nmp_ResetDeviceLogThrottling:3519: last error status from device t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________ repeated 2 times
2020-11-15T20:29:16.279Z cpu0:2098543 opID=595e28b0)World: 11942: VC opID 9874db87 maps to vmkernel opID 595e28b0
2020-11-15T20:29:16.279Z cpu0:2098543 opID=595e28b0)NVDManagement: 1478: No nvdimms found on the system
2020-11-15T20:29:21.495Z cpu4:2097271)vmw_ahci[0000001f]: CompletionBottomHalf:Error port=4, PxIS=0x40000001, PxTDF=0x461,PxSERR=0x00000000, PxCI=0x00000002, PxSACT=0x00000002, ActiveTags=0x00000002
2020-11-15T20:29:21.495Z cpu4:2097271)vmw_ahci[0000001f]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x1000, lbc=0x1
2020-11-15T20:29:21.495Z cpu4:2097271)vmw_ahci[0000001f]: CompletionBottomHalf:cfis->command= 0x61
2020-11-15T20:29:21.495Z cpu4:2097271)vmw_ahci[0000001f]: LogExceptionSignal:Port 4, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2020-11-15T20:29:21.495Z cpu0:2097590)vmw_ahci[0000001f]: LogExceptionProcess:Port 4, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2020-11-15T20:29:21.495Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2020-11-15T20:29:21.495Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2020-11-15T20:29:21.507Z cpu0:2097590)vmw_ahci[0000001f]: _IssueComReset:Issuing comreset...
2020-11-15T20:29:21.510Z cpu0:2097590)vmw_ahci[0000001f]: ExceptionHandlerWorld:fail a command on slot 1
2020-11-15T20:29:21.510Z cpu4:2097183)NMP: nmp_ThrottleLogForDevice:3689: Cmd 0x2a (0x459a4080f440, 2098532) to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" on path "vmhba0:C0:T4:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sens$
2020-11-15T20:29:21.510Z cpu4:2097183)ScsiDeviceIO: 3029: Cmd(0x459a4080f440) 0x2a, CmdSN 0x2 from world 2098532 to dev "t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2020-11-15T20:29:21.510Z cpu4:2097183)0x0.
2020-11-15T20:29:21.510Z cpu4:2098544 opID=49a97243)World: 11942: VC opID 9874db8f maps to vmkernel opID 49a97243
2020-11-15T20:29:21.510Z cpu4:2098544 opID=49a97243)LVM: 6546: Forcing APD unregistration of devID 5bf7fa17-29ceaf24-65e2-00fd45fdc780 in state 1.
2020-11-15T20:29:21.510Z cpu4:2098544 opID=49a97243)LVM: 15203: Failed to open device t10.ATA_____CT1000MX500SSD1_________________________1823E141BB46________:1 : I/O error
2020-11-15T21:06:20.347Z cpu5:2099196 opID=230ba97b)World: 11942: VC opID 9874dd1d maps to vmkernel opID 230ba97b
2020-11-15T21:06:20.347Z cpu5:2099196 opID=230ba97b)NVDManagement: 1478: No nvdimms found on the system

Tags (1)
0 Kudos
14 Replies
Highlighted
VMware Employee
VMware Employee

Moderator: Please create threads in the area for the product used - moved to ESXi Discussions


Forum Usage Guidelines: https://communities.vmware.com/docs/DOC-12286
VMware Training & Certification blog: http://vmwaretraining.blogspot.com
0 Kudos
Highlighted
Commander
Commander

Log snippet gives errors related to Write operation and IO errors also seen. Looks like there is an issue with SSD. Run a hardware diagnosis to check the health status of SSD. You may try to formate the SSD completely and upgrade the SSD firmware and try to reinstall ESXi. 

Regards, Suresh https://vconnectit.wordpress.com/
0 Kudos
Highlighted
Contributor
Contributor

SSD appears to be fine, had to update firmware but still the same problem...

The only solution would be to lose all the VM's?

Is it necessary to reinstall ESXI?  got it installed in the internal USB port.

 

And thanks for the Firmware idea, didn’t occur sooner 🙂

 

0 Kudos
Highlighted
Commander
Commander

If ESXI is on SD card then reinstallation is not required.

You may just format the SSD and create a new vmfs volume.

Regarding data loss, I think the datastore is already in an inaccessible or inactive state. I don't think read operation will work well.

You may run voma to check the vmfs state. voma is a tool which comes with ESXi to perform diskcheck and filesystem checks..

 

If nothing works then last option is to recreate the volume, your vms will be lost.

Regards, Suresh https://vconnectit.wordpress.com/
0 Kudos
Highlighted
Contributor
Contributor

Well, thanks for the vome tip, 0 errors but still can’t connect.

Going to format the SSD.

Thanks for the help SureshKumarMuth !!!!! 

0 Kudos
Highlighted
Immortal
Immortal

Hi
sorry I am in a hurry and will give you a longer reply later.
Please ignore this answer you just received "You may just format the SSD and create a new vmfs volume."

[removed by moderator]

Is the device used by that datastore listed in /dev/disks ?
Please check if

hexdump -C /dev/disks/DEVICE | less

displays anything and report back.
Ulli

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos
Highlighted
Immortal
Immortal

@ SureshKumarMuth 

[removed by moderator]

Ulli

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos
Highlighted
Commander
Commander

@continuum 

Hi Ulli,

Sorry, that statement should be at the bottom of my reply. However, I have mentioned to check the metadata consistency using voma to check the headers/errors. I also mentioned ,the last option is to go for formatting when nothing helps.

I value the importance of data, there is no intention to create any loss for the user. I had a similar issue in the recent past , my replies where based on that. 

 

Regards, Suresh https://vconnectit.wordpress.com/
0 Kudos
Highlighted
Immortal
Immortal

Unfortunately many users believe that I/O errors of a device used as a datastore means that there is no chance to get back the data.
Dont know why this belief is so widespread ... anyway the approach that I use to deal with such problems is to do a clone of the "faulty device" with ddrescue.

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos
Highlighted
Contributor
Contributor

Hello,

It's still not late, haven’t given up yet, the SSD does show in /dev/disks, for the second question, basically the hexadecimal table appears, with data in it, my not be the data i want, but it does have data.

something like this:

00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 1d 9a 00 00 |................|
000001c0 01 00 ee fe ff ff 01 00 00 00 af 6d 70 74 00 00 |...........mpt..|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
00000200 45 46 49 20 50 41 52 54 00 00 01 00 5c 00 00 00 |EFI PART....\...|

0 Kudos
Highlighted
Immortal
Immortal

Try to clone the device with ddrescue - either to another device or to a vmdk.
ddrescue will enable you to copy inspite of the I/O error - when you are lucky the error size is so small that it does not affect the output.

Ulli

 

 

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos
Highlighted
Contributor
Contributor

Could I clone the drive into an HDD?

currently a 1 TB SSD is out of question ..

 

and thanks for all the help!!

0 Kudos
Highlighted
Immortal
Immortal

Hmmm - I dont know anything about your Linux  skills ...
If we do this together my next question would be. where do you can make best use of the results - any location on earth is an option when it has:
at least 1TB of free space and
is addressable from Linux.

Fun story: last time I did this stunt the source was a 20 TB raid-array in Australia and we cloned to a Windows share in Rio.
So to answer the question: can it be a HDD ? - sure.

Success rate will be way better when we do next steps together - feel free to call me via skype and we can start the clone on short notice.

Ulli

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos
Highlighted
Immortal
Immortal

Next steps:

A: easier and faster : boot physical host into a Linux LiveCD
1. find the device identifier for the problematic SSD - that should be something like /dev/ID
2. decide wether you want to clone to a device or to a file - if it is a device then it would look like /dev/OUT
3. if you want to clone to a file - mount something with 1TB of free space - this can be a local disk or a remote share -  mountpoint may look like /external
4. cd to a location with a few MBs of writeable space
5.. ddrescue -f /dev/ID /dev/OUT clone.log ### if target is a device
ddrescue /dev/ID /external/recovery-flat.vmdk clone.log ### if target is a new file

B: no ESXi-downtime required - WARNING: dont try this road if you need to look up any single step in google - RISK: high for a newbie
1. create Linux VM
2. connect to ESXi via sshfs readonly and use mountpoint /esxi-in to mount the complete esxi-filesystem
3. connect to ESXi via sshfs writeable and use mountpoint /esxi-out to mount a single writeable directory inside a large VMFS-volume
4. cd to /esxi-out
ddrescue /esxi-in/dev/disks/ID recovery-flat.vmdk clone.log

If you are lucky the result can be then directly mounted as a functional VMFS-volume that just needs a new signature.
If the  I/O error occured in an unlucky area of the original volume the result may still be not mountable as VMFS-volume.
Then additional recovery steps will be required ...

Whats better : clone to disk or to a file ?
That depends on the location of the first I/O error.
If the first error occurs in the first 2gb of the volume: chances to easily mount the clone as a new datastore are low - then clone to a file.
If the first error occurs later inside the volume - chances are good that the result can be used as VMFS-volume - so clone to a disk.

If the device has several I/O errors the best option for the clone is a file stored in a NTFS formatted portable USB-disk.

Ulli

 

 

 

Do you need support with a recovery problem ? - call me via skype "sanbarrow"
0 Kudos