Ardo123
Contributor
Contributor

ESXi 5.5 bad Hard disk

I'm using  vmware 5.5 u2 Free to test and learning.

One of disks are bad (SMART), and I'm trying copy/move VMs to a new disk, but a .vmdk is giving error.

I got to try to move the following message on  vSphere Client:

Move file Error caused  by file /vmfs/volumes/57d1ab60-f3524ea3-88a4-000acd18c54e/XP_3 root  09/09/2016 08:13:55 09/09/2016 08:13:55 09/09/2016 08:14:12

I accessed by ssh the server and ran voma comand and  the result was:

voma -m vmfs -d /vmfs/devices/disks/t10.ATA_____WDC_WD5000AAKX2D00U6AA0_______________________WD2DWCC2E7NK5FTS:1

Checking if device is actively used by other hosts

Running VMFS Checker version 1.0 in default mode

Initializing LVM metadata, Basic Checks will be done

Phase 1: Checking VMFS header and resource files

   Detected VMFS file system (labeled:'500.1v') with UUID:56d80421-93745713-d238-000acd18c54e, Version 5:60

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

   Found stale lock [type 10c00001 offset 183064576 v 310, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 5938

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 183066624 v 311, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 5950

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 183068672 v 312, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 5961

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 183070720 v 313, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 5985

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 183072768 v 314, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 5997

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 183300096 v 212, hb offset 4055040

         gen 231, mode 1, owner 576bff08-5ee6457b-a091-6805ca37600e mtime 6033

         num 0 gblnum 0 gblgen 0 gblbrk 0]

Phase 4: Checking pathname and connectivity.

ON-DISK ERROR: Invalid direntry <425, 109> index 33

Phase 5: Checking resource reference counts.

ON-DISK ERROR: PB inconsistency found: (1962,2) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (164,5) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,192) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,193) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,194) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,195) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,196) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,197) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,198) allocated in bitmap, but never used

ON-DISK ERROR: FB inconsistency found: (1701,199) allocated in bitmap, but never used

ON-DISK ERROR: SB inconsistency found: (125,30) allocated in bitmap, but never used

Total Errors Found:           12

And I ran this commands :

vmkfstools -p 0 /vmfs/volumes/56d80421-93745713-d238-000acd18c54e/XP_3/XP_3-flat.vmdk

Mapping for file /vmfs/volumes/56d80421-93745713-d238-000acd18c54e/XP_3/XP_3-flat.vmdk (10737418240 bytes in size):

[           0:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 71528611840 -->  72065482752)]

[   536870912:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 72065482752 -->  72602353664)]

[  1073741824:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 72602353664 -->  73139224576)]

[  1610612736:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 73139224576 -->  73676095488)]

[  2147483648:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 73676095488 -->  74212966400)]

[  2684354560:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 74212966400 -->  74749837312)]

[  3221225472:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 74749837312 -->  75286708224)]

[  3758096384:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 75286708224 -->  75823579136)]

[  4294967296:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 75823579136 -->  76360450048)]

[  4831838208:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 76360450048 -->  76897320960)]

[  5368709120:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 76897320960 -->  77434191872)]

[  5905580032:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 77434191872 -->  77971062784)]

[  6442450944:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 77971062784 -->  78507933696)]

[  6979321856:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 78507933696 -->  79044804608)]

[  7516192768:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 79044804608 -->  79581675520)]

[  8053063680:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 79581675520 -->  80118546432)]

[  8589934592:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 80118546432 -->  80655417344)]

[  9126805504:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 80655417344 -->  81192288256)]

[  9663676416:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 81192288256 -->  81729159168)]

[ 10200547328:   536870912] --> [VMFS -- LVID:56d80421-2cc5e41c-fe9c-000acd18c54e/56d80421-0eb53893-af39-000acd18c54e/1:( 81729159168 -->  82266030080)]

and

vmkfstools --fix repair  /vmfs/volumes/56d80421-93745713-d238-000acd18c54e/XP_3/XP_3-flat.vmdk

DiskLib_Check() failed for source disk '/vmfs/volumes/56d80421-93745713-d238-000acd18c54e/XP_3/XP_3-flat.vmdk': The file specified is not a virtual disk (15).

How I can recover this files and VMs?

0 Kudos
7 Replies
gb76534567
Enthusiast
Enthusiast

Something is locking the VMDKs, is the VM powered off?

0 Kudos
Ardo123
Contributor
Contributor

No, but we had a power lost, and the  HD SMART status is bad.

0 Kudos
continuum
Immortal
Immortal

That looks good - I assume you have some free space on a datastore not on the same disk.:

Plan A - prognosis:  30:70  - expected problems:  dd aborts with I/O errors or Bad FileDescriptionMessage

extract the vmdk with the system that also created the problem in the first place - so Plan A uses the local installed ESXi.
create new empty file in /tmp/x.sh
edit / inject the following 3 lines with winscp

##### snip

IF="/dev/disks/t10.ATA_____WDC_WD5000AAKX2D00U6AA0_______________________WD2DWCC2E7NK5FTS:1"

OF="/vmfs/volumes/datastoreX/XP_3-flat.vmdk"

dd if=$IF of=$OF bs=1M count=10240 seek=0 skip=68215 conv=notrunc

##### snip

adjust second line so it fits your needs by replacing datastoreX with a working path.
Save script.
Make it executable with
chmod 755 /tmp/x.sh
execute it with  /tmp/x.sh

Sometimes that will work - if not we do the same with Linux which in most cases has significantly better results but takes more time.
PLAN A appears first for the only reason : you can do it yourself easily. With a bit of luck it works but I doubt so.

Plan B - prognosis:  95:05 - expected problems : if your data is uptodate I dont anticipate any real problems other than finding a time that is acceptable for me

For plan B get latest MOA LiveCD from my homepage

http://sanbarrow.com/livecds/moa64-nogui/MOA64-nogui-incl-src-111014-efi.iso
If Plan B is required I expect that you call me via skype and have Teamviewer 10 ( not # version 11 ) available

File is not a virtual disk error can be ignored because you adressed the wrong file.
Everything else is within the usual range.
Dont modify my instructions - ask if something does not work.
You may continue to use that datastore - but that will reduce the chances.

Ulli

.

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
Ardo123
Contributor
Contributor

Hi!

Unfortunely Plan A did not woks.The command results on this message:

dd: /dev/disks/t10.ATA_____WDC_WD5000AAKX2D00U6AA0_______________________WD2DWCC2E7NK5FTS:1: Input/output error

was sent at about 372Mb and stop.

0 Kudos
continuum
Immortal
Immortal

Ok - I did expect a failure so I am not surprised at all.
Lets go to plan B - dont wait to long ...  waiting for calls on skype.
Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
downunderscouse
Contributor
Contributor

Hi,

I am having the same issue as this user - a hard drive on my lab workstation showed smart errors and overheating warnings in the logs so I tried to svmotion the shut down VMs onto another DS

This failed, I attempted to clone using vmfstools but it failed also.  A dd gave input/output errors.  I believe I have bad block and need to use your restoration method using a dd script from a mapping file.

I created a mapping and then tried to create a script automatically by reading in each line and assigning each section to a variable and changing the if between device or zero device depending on the string content ie VMFS or NOMP.

Sadly this did not work and I do not think i understood your instructions on another forum post:

vm locked

my script is:

#!/bin/sh

SAVEIFS=$IFS

IFS=$(echo -en "\n\b")

DD_SCRIPT="restore_dd.sh"

echo "#!/bin/sh" > $DD_SCRIPT

for i in `cat master-flat.vmdk.tmp|grep -v '^mapping'`;

do

        IF="t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1"

        STAT=`echo $i|awk '{print $5;}'|sed -e 's/^\[//'`

        SEEK=`echo $i|awk '{print $2;}'|sed -e 's/://'`

        COUNT=`echo $i|awk '{print $3;}'|sed -e 's/\]//'`

        SKIP=`echo $i|awk '{print $8;}'`

        if [[ "$STAT" = "VMFS" ]];

        then

                echo "dd if=$IF of=master-flat-fixed.vmdk bs=1M count=$COUNT skip=$SKIP seek=$SEEK CONV=notrunc" >> $DD_SCRIPT

        else

                echo "dd if=/dev/zero of=master-flat-fixed.vmdk bs=1M count=$COUNT skip=$SKIP seek=$SEEK CONV=notrunc" >> $DD_SCRIPT

        fi

done

IFS=$SAVEIFS

The result is a long script with dd commands but this did not create VMDK as expected - it sat there and was 2TB in size according to ls -h, which is hard given the disk is only 700GB

What am I doing wrong?  I really don't think I have understood this process.  I really do need to get these VMs off though as they are a customised kubernetes cluster I built.  Sadly not enough storage to do backups at this stage.

Any help appreciated

0 Kudos
continuum
Immortal
Immortal

Can you send me your mapping file and your dd-command to check what is wrong with it ?
Please contact me via skype "sanbarrow"
Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos