VMware Cloud Community
asmme
Contributor
Contributor

vm locked

Hello,

i have the esxi 5.1. build 2323236 whit an virtual machine linux locket .

1) I can not back up because it faults (Veeam . Trilead)

2) i can not copy because it faults

3) the vm start on and work ok.

i have start voma

voma -m vmfs -f check -d /vmfs/devices/disks/naa.600508b1001c26d62806fd6a9c726bd6:3

whit erro 0 but Found stale lock, PLEAE HELP ME!!!

/dev/disks # voma -m vmfs -f check -d /vmfs/devices/disks/naa.600508b1001c26d62806fd6a9c726bd6:3

Checking if device is actively used by other hosts

Running VMFS Checker version 0.9 in check mode

Initializing LVM metadata, Basic Checks will be done

Phase 1: Checking VMFS header and resource files

   Detected file system (labeled:'datastore1') with UUID:4f4e2130-98286e20-fe3c-441ea15addd9, Version 5:54

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

   Found stale lock [type 10c00001 offset 49614848 v 107391, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 714841

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49672192 v 103008, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 1963

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49694720 v 103011, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 2087

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49756160 v 106439, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 153378

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49793024 v 102993, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 1340

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49815552 v 107387, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 591166

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49827840 v 103014, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 2173

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49920000 v 107389, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 670038

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49928192 v 107384, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 591021

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49940480 v 102999, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 1627

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 49942528 v 111584, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 994925

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 50231296 v 106954, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 192768

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00001 offset 50276352 v 106967, hb offset 3244032

         gen 123, mode 1, owner 55f3f9d8-395937c9-187c-441ea15adddb mtime 193926

         num 0 gblnum 0 gblgen 0 gblbrk 0]

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

Total Errors Found:           0

Reply
0 Kudos
10 Replies
SureshKumarMuth
Commander
Commander

can you please provide us the error message which shows the files are locked,looks like its a general vmfs locking issue, no need to use voma, voma is for lock corruption detection.

What error you are getting when you take a backup ?

If the error points out any particular vmdk or other file , run the following command and post the output here

vmkfstools -D <file name with full path>

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
asmme
Contributor
Contributor

Hello, thanks for you replay.

This is the file "Fatturazione001-flat.vmdk " that during a copy from the datastore to a local folder and then get 8gb crashes.

that returns the error "trilead / HPE VM Explorer" is this: http 500 internal fatal error

The migrate whit vcenter error: A general system error occurred Invalid response code 503 service Unavailable

Veeam the error is generic.

i have this file in my data store:

/vmfs/volumes/4f4e2130-98286e20-fe3c-441ea15addd9/Fatturazione001 # ls -la

drwxr-xr-x    1 root     root          2520 Apr 17 19:17 .

drwxr-xr-t    1 root     root          5880 Feb  5 18:12 ..

-rw-r--r--    1 root     root            13 Apr 17 19:16 Fatturazione001-aux.xml

-rw-------    1 root     root     37580963840 Apr 17 19:16 Fatturazione001-flat.vmdk

-rw-------    1 root     root          8684 Apr 17 19:14 Fatturazione001.nvram

-rw-------    1 root     root           525 Apr 17 19:16 Fatturazione001.vmdk

-rw-r--r--    1 root     root            46 Apr 17 19:17 Fatturazione001.vmsd

-rw-r--r--    1 root     root          3205 Apr 17 19:16 Fatturazione001.vmx

-rw-r--r--    1 root     root          2952 Oct  3  2015 Fatturazione001.vmx.orig

-rw-------    1 root     root          3444 Apr 17 18:01 Fatturazione001.vmxf

-rw-r--r--    1 root     root        137707 Apr 17 14:27 vmware-44.log

-rw-r--r--    1 root     root        166003 Apr 17 14:32 vmware-45.log

-rw-r--r--    1 root     root        136508 Apr 17 14:36 vmware-46.log

-rw-r--r--    1 root     root        137458 Apr 17 16:53 vmware-47.log

-rw-r--r--    1 root     root        175784 Apr 17 18:04 vmware-48.log

-rw-r--r--    1 root     root        136401 Apr 17 18:25 vmware-49.log

-rw-r--r--    1 root     root        137186 Apr 17 19:14 vmware.log

-rw-r--r--    1 root     root          1336 Oct  3  2015 vmxreplication.xml

output here:

/vmfs/volumes/4f4e2130-98286e20-fe3c-441ea15addd9/Fatturazione001 # vmkfstools -D Fatturazione001-flat.vmdk

Lock [type 10c00001 offset 49737728 v 110341, hb offset 3870720

gen 321, mode 0, owner 00000000-00000000-0000-000000000000 mtime 3361

num 0 gblnum 0 gblgen 0 gblbrk 0]

Addr <4, 77, 86>, gen 107319, links 1, type reg, flags 0, uid 0, gid 0, mode 600

len 37580963840, nb 35840 tbz 0, cow 0, newSinceEpoch 0, zla 3, bs 1048576

/vmfs/volumes/4f4e2130-98286e20-fe3c-441ea15addd9/Fatturazione001 #

PLEASE HELP ME

Reply
0 Kudos
SureshKumarMuth
Commander
Commander

Based on the information provided, I dont see any lock in the file at vmfs level. However, the file copy does not work , I suspect this is due to network issue while copying

Error 503 generally caused due to windows TCP stack issue

vCenter Server returns 503 Service Unavailable errors (2033822) | VMware KB

You have multiple options to copy the file

Is the VM powered off ? Can you try to export the complete VM as OVA/OVF ?

Or you can mount an NFS volume to ESXi host and copy from VMFS volume to NFS using command line.

or Try to take RDP to vCenter server and download directly to vCenter server machine which may eleminate the issues due to port

Or try using Winscp to copy the data.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
asmme
Contributor
Contributor

Hello Sureshkumar,

thanks for you replay.

export OVA  error: Failed to export Virtual machine, Impossibole read stream cinnection date, connection close

export OVF error :Failed to export Virtual machine, Impossibole read stream cinnection date, connection close

copy to the rdp vcenter the file Fatturazione001-flat.vmdk not is copy in local folder

winscp error: Copyng file /vmfs/volume/datastore1/Fatturazione001/Fatturazione001-flat.vmdk failed, scp Input/output error al 99%

It left for me to try to parch microsoft windows 2008 R2, and to mount an NFS volume ESXi host and copy from VMFS volume to NFS using the command line.

Please could you give me some more info on how to copy after mounting the NFS volume? just a simple cp -rp ??

thanks

Reply
0 Kudos
SureshKumarMuth
Commander
Commander

Yes, once you mount the NFS volume , you may use normal unix cp command to copy the data from vmfs volume to nfs volume via command line from ESXi host.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
continuum
Immortal
Immortal

With an error message like "scp Input/output error al 99%"
I would suggest to run the command
vmkfstools -p 0 /vmfs/volume/datastore1/Fatturazione001/Fatturazione001-flat.vmdk > /tmp/flatfilemap.txt
and then use flatfilemap.txt to create a script with dd commands to copy out the file by bypassing VMFS-filesystem and directly accessing the physical volume.
The script would then use dd commands like this:
dd if=/dev/disks/device:1 of=/vmfs/volumes/datastore/directory/recovered-flat.vmdk bs=1M seek=0 count=* skip=*
Depending on the fragmentation on the original volume you would get between 1 and million lines with at least one line failing with an I/O error.
The failing area could then be replaced with zeroes.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
asmme
Contributor
Contributor

hello,

i have mount nfs datastore, the erroro con cp -rp is this:

cp -rp /vmfs/volumes/datastore1/Fatturazione001 .

cp: read error: Input/output error

please help me

Reply
0 Kudos
asmme
Contributor
Contributor

hello continuum,

Thank you for the reply,

But I did not understand what to do after the command "vmkfstools -p 0 /vmfs/volume/datastore1/Fatturing001/Fatturing001-flat.vmdk> /tmp/flatfilemap.txt"

Thanks

Reply
0 Kudos
continuum
Immortal
Immortal

Sorry - I dont know how to monitor all old  posts I replied to - so I hope it was not too urgent.
If it was urgent - you should have called me via skype. Thats why I use that signature ....
Anyway - in case anybody else may stumble across this post .....

Question: how do you handle a large flat- or delta-vmdk that you cant handle with the reference procedure "vmkfstools -i bad-flat good-flat" ?


Problem: if one area of a flat.vmdk is damaged beyond repair the reference procedure rejects all operations: result: you get nothing !
Best-approach: if you cant read the vmdk - read the blocks using their offset from vmfs-partition-start


Very simplified in sunny weather conditions the following 2 commands get the same result:

(bad-flat.vmdk size is 1024MB and offset is 387241MB)
procedure-A:

vmkfstools -i /vmfs/volumes/datastore1/example/bad.vmdk /vmfs/volumes/healthy-other-datastore/example/fixed.vmdk -d thick

procedure-B:

dd if=/dev/disks/device:partition bs=1M skip=387241 conv=notrunc count=1024 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk


Result - When all goes well you get a workable clone both ways. When cmd1 fails: nothing at all.
Thats where you are at the moment.


Question: Can we split the one big command in single steps and tolerate one or even a couple of errors ?
procedure-A: no way - at least not as far I know
procedure-B: yes - easy. If necessary we create a long script with one line for every single 1mb-block of the 1024mb-vmdk.
       You may get something that  looks like this easy example:

dd if=/dev/disks/device:partition bs=1M skip=387241 conv=notrunc count=1000 seek=0 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

dd if=/dev/disks/device:partition bs=1M skip=388241 conv=notrunc count=1 seek=1000 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

dd if=/dev/disks/device:partition bs=1M skip=388242 conv=notrunc count=23 seek=1001 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

Result - when any of the now 3 cmds fail: you still can clone as much you can get !

          If you get:    

dd if=/dev/disks/device:partition bs=1M skip=387241 conv=notrunc count=1000 seek=0 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

dd if=/dev/disks/device:partition bs=1M skip=388241 conv=notrunc count=1 seek=1000 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk !!!! failed

dd if=/dev/disks/device:partition bs=1M skip=388242 conv=notrunc count=23 seek=1001 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

          You run this instead:

dd if=/dev/disks/device:partition bs=1M skip=387241 conv=notrunc count=1000 seek=0 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

dd if=/dev/zero bs=1M conv=notrunc count=1 seek=1000 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

dd if=/dev/disks/device:partition bs=1M skip=388242 conv=notrunc count=23 seek=1001 of=/vmfs/volumes/healthy-other-datastore/example/fixed-flat.vmdk

Result - you can create a clone once you located the bad areas and replaced them with blank space.
Original error results in nothing at all.
Once you reduced the area that produces the error to  the smallest section you can find,

the result is a full image of 1024MB with a small wiped area.
If you are lucky your guest never even reads here ....
If you are not so lucky you get lots of unreadable areas from the devicenode. ...

Forget about the details such as filenames, descriptors for now.
The idea is to replace the current command using the reference procedure which reads the vmfs-metadata for bad-flat.vmdk with a shell-script for a bunch of dd-commands that read raw blocks from a device-file instead.
The output from vmkfstools -p 0 bad-flat.vmdk > bad-flat.map is almost ready to use.
Each line has 4 decimal values and one string.
The strings has either VMFS or NOMP. If it is VMFS you use if=devicenode and if it is NOMP you can use dev zero.
A line from the mapping txt so has: seek=decimal  count=decimal if=strings skip=decimal readuntil=decimal (not necessarsy).
Here we are lucky if  vmkfstools -p 0 bad-flat.vmdk > bad-flat.map still works.


Ok - thats the basics about the command to use.
The different ways to actually execute the dd-script have the same impact on the final results.
I get best results if I physically boot the ESXi host into my MOA-livecds. Not practicable in many cases as it interrupts production.
Very practicable: I boot a VM into a MOA_livecd and keep a low profile by accessing the datastore readonly via ssh. Only the output directory on VMFS needs write access.

Lazy and high failure rate: I execute the commands as root user of the active ESXi. Most of the times I get more problems doing it directly with the ESXi.

WARNING: do not copy and paste any custom commands from a discussion in forums - if you do not understand what this commands do - you will make it worse.

See signature if you have urgent related problems.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
downunderscouse
Contributor
Contributor

continuum

Hi i tried this method and so far it's been no go.

I tried writing a script to read in line by line the mappings file and write out a script using the info in the post for which column is for what part of the DD command.  It's clearly not making sense because it doesn't work.

#!/bin/sh

SAVEIFS=$IFS

IFS=$(echo -en "\n\b")

DD_SCRIPT="restore_dd.sh"

echo "#!/bin/sh" > $DD_SCRIPT

for i in `cat master-flat.vmdk.tmp|grep -v '^Mapping'`;

do

        IF="t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1"

        STAT=`echo $i|awk '{print $5;}'|sed -e 's/^\[//'`

        SEEK=`echo $i|awk '{print $2;}'|sed -e 's/://'`

        COUNT=`echo $i|awk '{print $3;}'|sed -e 's/\]//'`

        SKIP=`echo $i|awk '{print $8;}'`

        if [[ "$STAT" = "VMFS" ]];

        then

                echo "dd if=$IF of=master-flat-fixed.vmdk bs=1M count=$COUNT skip=$SKIP seek=$SEEK CONV=notrunc" >> $DD_SCRIPT

        else

                echo "dd if=/dev/zero of=master-flat-fixed.vmdk bs=1M count=$COUNT skip=$SKIP seek=$SEEK CONV=notrunc" >> $DD_SCRIPT

        fi

done

IFS=$SAVEIFS

which gives me something like:

#!/bin/sh

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=2097152 skip=293999738880 seek=0 conv=no

dd if=/dev/zero of=master-flat-fixed.vmdk bs=1M count=1048576 skip=0 seek=2097152 conv=notrunc

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=5242880 skip=294003933184 seek=3145728 c

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=348647325696 seek=8388608 c

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=294009176064 seek=9437184 c

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=214865805312 seek=10485760

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=700675260416 seek=11534336

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=749747568640 seek=12582912

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=749983498240 seek=13631488

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=716638781440 seek=14680064

dd if="/dev/disks/t10.ATA_____ST3750640NS_________________________________________5QD5ENBZ:1" of=master-flat-fixed.vmdk bs=1M count=1048576 skip=715905826816 seek=15728640

dd if=/dev/zero of=master-flat-fixed.vmdk bs=1M count=22020096 skip=0 seek=16777216 conv=notrunc

The first dd command writes out an empty file, the second just sits there forever and the file size becomes 2.0T which is impossible given its a 700GB datastore.

Can you please advise me what I am doing wrong?

Reply
0 Kudos