I've been struggling to recover a missing local vfms 5 datastore, the LUN is visible but refreshing and or rescanning the Datastore just does not appear.
I am unable to mount in CLI either.
The HP SmartArray P400i controller shows the disks as ok although the boot information shows
1720 - S.M.A.R.T. Hard Drive(s) Detect Imminent Failure Port 2I: Box 1: Bay 4 - suggesting a SAS disk issue is imminent.
Is there anything I can do to recover a vm from this missing Datastore?
PartedUtil shows
# partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0
gpt
71380 255 63 1146734896
1 2048 1146734591 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
/vmfs/volumes # esxcli storage core device list |grep -A27 ^mpx.vmhba1:C0:T0:L0
mpx.vmhba1:C0:T0:L0
Display Name: Local VMware Disk (mpx.vmhba1:C0:T0:L0)
Has Settable Display Name: false
Size: 559929
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0
Vendor: VMware
Model: Block device
Revision: 1.0
SCSI Level: 2
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: unsupported
Other UIDs: vml.0000000000766d686261313a303a30
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
offset="128 2048"; for dev in `esxcfg-scsidevs -l | grep "Console Device:" | awk {'print $3'}`; do disk=$dev; echo $disk; partedUtil getptbl $disk; { for i in `echo $offset`; do echo "Checking offset found at $i:"; hexdump -n4 -s $((0x100000+(512*$i))) $disk; hexdump -n4 -s $((0x1300000+(512*$i))) $disk; hexdump -C -n 128 -s $((0x130001d + (512*$i))) $disk; done; } | grep -B 1 -A 5 d00d; echo "---------------------"; done
Result -
---------------------
/vmfs/devices/disks/mpx.vmhba1:C0:T0:L0
gpt
71380 255 63 1146734896
1 2048 1146734591 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
Checking offset found at 2048:
0200000 d00d c001
0200004
1400000 f15e 2fab
1400004
0140001d 4c 43 4c 5f 52 41 49 44 30 00 00 00 00 00 00 00 |LCL_RAID0.......|
0140002d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
vmkernel.log output -
2017-06-07T17:40:21.582Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:21.582Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e8087c140) 0x28, CmdSN 0x2c5 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:24.695Z cpu2:32825)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:24.695Z cpu2:32787)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80859ac0, 33801) to dev "mpx.vmhba1:C0:T0:L0" on path "vmhba1:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2017-06-07T17:40:24.695Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:24.695Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80859ac0) 0x28, CmdSN 0x2c7 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:27.807Z cpu2:32783)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:27.807Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:27.807Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80859200) 0x28, CmdSN 0x2c9 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:30.920Z cpu2:32825)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:30.920Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:30.920Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80858a80) 0x28, CmdSN 0x2cb from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:34.032Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:34.032Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:34.032Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80858300) 0x28, CmdSN 0x2cd from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:37.144Z cpu2:32793)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:37.145Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:37.145Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e808569c0) 0x28, CmdSN 0x2db from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:40.255Z cpu2:32843)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:40.255Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:40.255Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80856240) 0x28, CmdSN 0x2dd from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:43.367Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:43.367Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:43.367Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80855ac0) 0x28, CmdSN 0x2df from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:46.479Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:40:46.480Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:40:46.480Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80855340) 0x28, CmdSN 0x2e1 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-06-07T17:40:46.480Z cpu3:33801)Fil3: 15338: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')
2017-06-07T17:40:48.804Z cpu1:33801)Config: 346: "SIOControlFlag2" = 0, Old Value: 0, (Status: 0x0)
2017-06-07T17:40:52.761Z cpu1:34271)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40
2017-06-07T17:40:53.563Z cpu2:34271)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffd12d8c)
2017-06-07T17:40:54.445Z cpu3:33801)Config: 346: "VMOverheadGrowthLimit" = -1, Old Value: -1, (Status: 0x0)
2017-06-07T17:40:57.728Z cpu2:33989)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.
2017-06-07T17:41:00.176Z cpu2:34050)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION byte 2 = 0x3
2017-06-07T17:41:00.177Z cpu2:32787)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e8086a4c0, 33986) to dev "mpx.vmhba1:C0:T0:L0" on path "vmhba1:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL
2017-06-07T17:41:00.177Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...
2017-06-07T17:41:00.177Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e8086a4c0) 0x28, CmdSN 0x38e from world 33986 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.
2017-06-07T17:41:00.247Z cpu2:34933)Boot Successful
2017-06-07T17:41:01.007Z cpu3:33804)Config: 346: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2017-06-07T17:41:01.736Z cpu1:34988)MemSched: vm 34988: 8263: extended swap to 8192 pgs
Below is the decode of the scsi sense code:
2017-06-07T17:41:00.177Z cpu2:32787)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e8086a4c0, 33986) to dev "mpx.vmhba1:C0:T0:L0" on path "vmhba1:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL
Host bit is reporting issue. Try restarting the host agents using command services.sh restart or the reboot the esxi host itself.
Host Status | [0x3] | TIME_OUT | This status is returned when the command in-flight to the array times out. |
Device Status | [0x0] | GOOD | This status is returned when there is no error from the device or target side. This is when you will see if there is a status for Host or Plugin. |
Plugin Status | [0x0] | GOOD | No error. (ESXi 5.x / 6.x only) |
Sense Key | [0x5] | ILLEGAL REQUEST | |
Additional Sense Data | 20/00 | INVALID COMMAND OPERATION CODE |
If you found this or any other answer helpful, please consider the use of the Helpful to award points.
Best Regards,
Deepak Koshal
CNE|CLA|CWMA|VCP4|VCP5|CCAH
Hi
> 1720 - S.M.A.R.T. Hard Drive(s) Detect Imminent Failure Port 2I: Box 1: Bay 4 - suggesting a SAS disk issue is imminent.
That does not sound good.
You should create a complete disk-image ASAP..
I hope you have another datastore with enough free space to create a full disk clone.
Run
dd if="/dev//disks/mpx.vmhba1:C0:T0:L0" bs=1M conv=notrunc of=/vmfs/volumes/<OTHER-DATASTORE>/almost-dead.bin
Once that is done stop using the disk and store it away.
We can extract the VMs from almost-dead.bin later.
At the moment priority number ONE is to create a diskimage befor the disk dies.
If possible improve air flow to that disk - maybe even put a gel-pad fresh out of the fridge on top of it to keep the disk as cool as possible.
Do NOT try to mount the datastore again - do not rescan - do not edit the partitiontable - do not try any GUI-operation against that disk !!!
Ulli
...
By the way - you post in a strange section of the forum - I only found your post through accident because I saw a message that you followed me.
Next time rather post in the regular ESXi section or here : VMware vSphere™ Storage
Hi dekoshal,
I've restarted the host but the issue remains, thanks for the sense data information!
Hi continuum,
I have another datastore on the same host although not be large enough to receive a full disk image as it's currently another 4 disk array but with RAID5, I can prepare this as a RAID0 then use this to receive the cloned image. One issue, I would need to rescan the HBA one more time to mount the datastore.
Do you believe the following is a reasonable approach
1 shutdown host & Pull failing disk
2 Wipe & prepare RAID0 Array
3 Boot to ESXi, Rescan & Mount new RAID0
4 Shutdown host & reattach failing disk
5 Boot ESXi and run dd if="/dev//disks/mpx.vmhba1:C0:T0:L0" bs=1M conv=notrunc of=/vmfs/volumes/<OTHER-DATASTORE>/almost-dead.bin
Thanks for your advice.
Jason
Hi Jason
In that case I would rather suggest to reboot the host into a Linux LiveCD and then store a disk image on a networkshare or large external USB-drive.
To create a Raid0 array formatted with VMFS just to store one file sounds like unnecessary work. ( and includes the risk that the Raid0 will be used afterwards)
Raid0 with VMFS on a standalone ESXi is an absolute NoGo
I completely agree with your statement regarding RAID0, unfortunately I inherited the system and issues. After the recovery has been completed the host will only be used as a lab environment, certainly not with RAID 0.
I will give a LiveCD a try however I am quite inexperienced with Linux environments in general and will likley be out of my depth.
Presumably I will still use the dd command replacing
dd if="/dev//disks/mpx.vmhba1:C0:T0:L0" bs=1M conv=notrunc of=/vmfs/volumes/<OTHER-DATASTORE>/almost-dead.bin
with
dd if="/dev//disks/mpx.vmhba1:C0:T0:L0" bs=1M conv=notrunc of=/<NetworkShare>/almost-dead.bin
Jason
if you need assistance feel free to call via skype - see my signature.
Ulli
Based upon your initial feedback and advice today after some rather clumsy trial and error returned the following via an Ubuntu Live CD.
Unfortunately it does not appear Ubuntu registered the SmartArray Controller completely, perhaps related to HP drivers.
ubuntu@ubuntu:/dev/disk/by-id$ sudo dd if="/dev/disks/mpx.vmhba1:C0:T0:L0" bs=1M conv=notrunc of=\\W4200000825\temp\almost-dead.bin returns
dd: failed to open '/dev/disk/mpx.vmhba1:C0:T0:L0': No such file or directory
ls by-
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0-part1
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0-part5
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0-part6
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0-part7
pci-0000:00:1d.7-usb-0:4:1.0-scsi-0:0:0:0-part8
pci-0000:00:1d.7-usb-0:5:1.0-scsi-0:0:0:0
pci-0000:00:1d.7-usb-0:5:1.0-scsi-0:0:0:0-part1
pci-0000:00:1f.1-ata-1
pci-0000:06:00.0-cciss-disk0 -- I am merely guessing this the RAID0
pci-0000:06:00.0-cciss-disk0-part1
pci-0000:06:00.0-cciss-disk1
pci-0000:06:00.0-cciss-disk1-part1
sudo lshw
description: RAID bus controller
product: Smart Array Controller
vendor: Hewlett-Packard Company
physical id: 0
bus info: pci@0000:06:00.0
A combination of gparted sudo lshw - and gparted resolved the following might be the location of the affected RAID0
sudo dd if="/dev/cciss/c0d0p1" bs=1M conv=notrunc of=\\W4200000825\temp\almost-dead.bin
dd: error reading '/dev/cciss/c0d0p1':
Input/output error
24+1 records in
24+1 records out
25223168 bytes (25 MB, 24 MiB) copied, 0.474883 s, 53.1 MB/s
I hope I have this wrong and there is something else I can try to return a more positive result.
"Jason
if you need assistance feel free to call via skype - see my signature.
Ulli"
continuum, I currently only have access to this host between 9 - 17:30 UTC +1
is there a suitable time for a skype ? day / evenings etc?
Hi Jason
that does not sound good.
Looks like dd is not suitable - instead we should use ddrescue as that can handle I/O errors.
I am located in germany and my skype is always on. So just send me a message and we should be able to arrange something asap.
Ulli