VMware Cloud Community
Mufimufin
Contributor
Contributor

The best way how to swap two hard drives

Dear all,

first of all I do not have any RAID on my home server and I plan to add it later when I will switch to some other solution (matter of years).

One of the non-system hard drives (datastore2) is showing symptomps of a faulty drive. I have also made a few tests and confirmed that (Hirenboot, Linux commands).

I plan to do the following:

a) connect another 10 Tb WD Red NAS hard drive to my ESXi 6.7 server (its already connected to my MSI H61MA-E35 (B3) MB, but ESXi refuses to detect it - that is however another problem I need to solve first)

b) copy the datastore2 content to datastore 3 (the new drive) using the Datastore browser option

c) rename datastore2 to some other name (e.g. goodbye-my-old-hard-drive)

d) rename datastore3 (new hard drive) to datastore2

e) disconnect old failing hard drive

I do not see any vMotion options in my ESXi 6.7 web interface (vSphere) because I do not have any Virtual Center installed.

I therefore believe that I will have to remove the VM from my invetory and re-add it once moved to make ESXi know that it was moved on another hard drive.

Potential important information is that there is another datastorage which is mounted to the same VM as the one on datastore2 and I would like to keep it that way as it contains some data. Does this represent a problem? I think that I will have to re-add it as well once the VM will be removed from the inventory, right?

Thank you.

Reply
0 Kudos
35 Replies
continuum
Immortal
Immortal

I do not have the full background here ...
Are you trying to copy a VMDK from one datastore on host A to another datastore on host B wile the vmdk fights back and reports I/O errors ?
If yes do this:
create a small Linux VM one one of those hosts - for example using this ISO:: http://sanbarrow.com/livecds/moa64-nogui/MOA64-nogui-incl-src-111014-efi.iso

then boot into it and create 2 directories:
mkdir /vmfs-in
mkdir /vmfs-out
Then connect to the source ESXi like this:
sshfs -o ro root@esxi-A:/vmfs/volumes/source-datastore /vmfs-in
Connect to the target ESXi like this:
sshfs root@esxi-B:/vmfs/volumes/target-datastore /vmfs-out
Once that is done you can use ddrescue like this
ddrescue /vmfs-in/source-directory/name-flat.vmdk /vmfs-out/target-directory/name-flat.vmdk   /vmfs-out/target-directory/name-copy.log
This way you will have readonly access on the source and can write to the target.
ddrescue will help to skip I/O errors - make sure to write the log-file ...
If you only have one host then use the same approach: mount one datastore readonly and the target datastore in writeable mode.
This is NOT the fastest approach.
But if it fails now due to I/O errors it is the easiest way to workaround I/O errors.
Feel free to call via skype if you can not handle this ....
Ulli

By the way - the OVA/OVF workaround will fail if there is only one small error in the source-file.
ddrescue should not fail even with multiple errors


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Mufimufin
Contributor
Contributor

Thank you for the answer continuum​ .

Are you trying to copy a VMDK from one datastore on host A to another datastore on host B wile the vmdk fights back and reports I/O errors ?

As mentioned in my first post I am trying to move one VM from host A and datastore A to host A and datastore B. In other words I need to move my VM from one hard drive to a new hard drive on the same computer with ESXi 6.7. I totally understand that there is lots of content in this thread now, so I hope that this will clear things up for you.

Anyway it does not matter now if that will be VDMK or OVF or any other format if that solution is going to work.

If you only have one host then use the same approach: mount one datastore readonly and the target datastore in writeable mode.

This is NOT the fastest approach.

But if it fails now due to I/O errors it is the easiest way to workaround I/O errors.

So if I understand it correctly:

1) I will connect to ESXi using SSH and run this command in the terminal:

ddrescue /source-data-store/source-directory-VM/name-flat.vmdk /target-data-store/target-directory-VM/name-flat.vmdk   /target-data-store/target-directory-VM/name-copy.log

2) After that I will copy the remaining files from the old data store (e.g. via data store browser or Winscp) to the new data store.

3) Rename the VM on the old data store (not sure if this is even necessary, but it was mentioned everywhere).

4) Register the VM from the new data store.

Please let me know if I have missed something.

Thank you - I will try it as soon as the current copy through the data store browser fails/finishes.

Reply
0 Kudos
continuum
Immortal
Immortal

Keep in mind that ddrescue does not exist in ESXi.
And if you really have to deal with VMDKs with I/O errors it is essential to mount the source datastore in readonly mode.
I use ddrescue for flat and delta vmdks. All other related files of a VM are so small that you can use cp.
For a quicktest if a flat.vmdk or delta.vmdk is healthy simply run
vmkfstools -p 0 name-flat.vmdk > mapping.txt
If that works you can assume that there are no I/O errors in that file.
If a vmkfstools -i command hangs at 100 % wait upto one or two hours.
Before you give up and kill the command first assume that it already did the job and check the output vmdk.
Try to dump the last 10 MBs to a new file and inspect that with hexdump -C.
If you see all zeroes then the vmkfstools -i command probably failed too early.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Mufimufin
Contributor
Contributor

If it does not exist on ESXi (not sure if it can be installed) then I will connect from the inner Linux VM to ESXi as mentioned by you above with a difference that both folders will be mapped on one host instead of two. I believe that this should do the job.

I will also try that flat/delta check once the current browser datastore copy process attempt will be finished.

Thanks.

Reply
0 Kudos
Mufimufin
Contributor
Contributor

pastedImage_0.png

         I will continue with the recovery procedure above now.

Reply
0 Kudos
Mufimufin
Contributor
Contributor

pastedImage_0.png

I know that you have mentioned this not being the fastest option, but is there any way how to speed things up? Waiting 2 days seems extreme to me given to the fact that this is just 1,82 Tb vmdk file.

Thanks.

Reply
0 Kudos
Mufimufin
Contributor
Contributor

Here is the mapping.txt file output:

[           0:         517] --> [VMFS -- LVID:50eeddd7-bf71cab4-d5f7-8c89a57ce0bc/50eeddd7-ae2abe54-5a98-8c89a57ce0bc/1:(    54821888 -->     54822405)]

I have used the following command:

vmkfstools -p 0 Flawless\ Server.vmdk > mapping.txt

It was quite fast (2 seconds).

Reply
0 Kudos
continuum
Immortal
Immortal

vmkfstools -p 0 Flawless\ Server.vmdk > mapping.txt
When you use that command against the descriptorfile it will only display a single line.
This command is basically useless as we already know the content of the descriptor file.
We expect to have the I/O error inside the flat.vmdk - not the descriptor.
So use the command against the flat.vmdk
vmkfstools -p 0 Flawless\ Server-flat.vmdk > mapping.txt
This will result in a mapping.txt with lots of lines - I have seen upto 1 million lines for a heavily fragmented thin provisioned flat.vmdk.
Regarding the poor performance ....
2 days for a 2TB flat.vmdk is really very slow.
I normally would expect about 12 - 24 hours for a 2TB file.
But that probably means that the source file has several I/O errors.
ddrescue can slow down to a crawl around the location of I/O errors - but if that is the only option to clone the file at all you probably have to live with that.
I started to write a blog post about issues like this - see
VMFS-volumes with I/O-errors | VM-Sickbay


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Mufimufin
Contributor
Contributor

You are right. I have completely missed that as I was thinking that the correct larger file is being checked. See annex for the new output (mapping.txt). I am not sure what I should be looking at in that file.

Regarding the slow performance - here is the current progress:

pastedImage_1.png

I do not see any errors so far.

Copy log:

pastedImage_2.png

However when I see almost 2 days of remaining time I am starting to think whether it would not be easier to simply ditch the current VM, install new one and move files through SSH (server x server).

Reply
0 Kudos
Mufimufin
Contributor
Contributor

This i crazy slow. Yesterday the ETA was 2 + days and today its again 2 + days.

I am starting to think whether this is not going through the local network since I have mapped those folders to local IP netwrok addresses on the same machine. Its however in the same pool so it should be going directly. But if that would be going through the network I may cancel, disconnect the powerline adapter and connect directly to the router.

2018-08-29_10-14-25.png

Reply
0 Kudos
continuum
Immortal
Immortal

One major advantage of ddrescue is the option to restart a copy-process.
So if you keep the outputfile and the copy-log you can abort at any time.
Then reconfigure your network and restart with the same command.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Mufimufin
Contributor
Contributor

Ddrescue finished today.

2018-08-30_19-31-55.png

There are no read errors, etc..

The strange thing is that it had 3h remaining time and suddenly finished on 94,22%. What does it mean - that it was not able to rescue all?

pastedImage_6.png

It does not seems to be complete.

Here is the output from its log file:

# Mapfile. Created by GNU ddrescue version 1.22

# Command line: ddrescue /vmfs-in/Flawless Server/Flawless Server-flat.vmdk /vmfs-out/Flawless Server/Flawless Server-flat.vmdk /vmfs-out/Flawless Server/copy1.log

# Start time:   2018-08-30 19:31:48

# Current time: 2018-08-30 19:31:51

# Finished

# current_pos  current_status  current_pass

0x1B495790000     +               1

#      pos        size  status

0x00000000  0x1B495790000  +

Regardless to this I have moved the remaining portion of files, renamed the previous folder (old datastore) and registered the VM.


The problem I face now is that it refuses to start and requires me to connect another vmdk on another hard drive (another data store):

2018-08-30_19-52-39.png

Cannot open the disk '/vmfs/volumes/5b81bebc-162ed5e6-79ce-6805ca128dcc/Flawless Server/Flawless Server.vmdk' or one of the snapshot disks it depends on.

I have not moved the other volume anywhere and its still present with the same name, so that is why I do not get what is wrong and what needs to be done to fix it.

Reply
0 Kudos
Mufimufin
Contributor
Contributor

Here is the vmware log file:

https://paste.ee/p/1aRhm

018-08-30T17:51:16.521Z| vmx| I125+ Power on failure messages: The file specified is not a virtual disk

2018-08-30T17:51:16.521Z| vmx| I125+ Cannot open the disk '/vmfs/volumes/5b81bebc-162ed5e6-79ce-6805ca128dcc/Flawless Server/Flawless Server.vmdk' or one of the snapshot disks it depends on.

2018-08-30T17:51:16.521Z| vmx| I125+ Module 'Disk' power on failed.

2018-08-30T17:51:16.521Z| vmx| I125+ Failed to start the virtual machine.

I have removed the second hard drive from the VM settings and added it again. The problem still persists.

Reply
0 Kudos
a_p_
Leadership
Leadership

The important part in the log file is:

2018-08-30T17:51:16.501Z| worker-2115541| I125: DISKLIB-VMFS  : VmfsExtentCommonOpen: possible extent truncation (?) realSize is 3662331008, size in descriptor 3886945402.

2018-08-30T17:51:16.501Z| worker-2115541| I125: DISKLIB-VMFS  : "/vmfs/volumes/5b81bebc-162ed5e6-79ce-6805ca128dcc/Flawless Server/Flawless Server-flat.vmdk" : failed to open (The file specified is not a virtual disk): Size of extent in descriptor file larger than real size. Type 3

So it seems that the copy job didn't complete, and 224,614,394 blocks (approximately 112 GB) are missing.


André

Reply
0 Kudos
Mufimufin
Contributor
Contributor

When I have tried to run the command again it reported "finished".

As mentioned above I also do not think that it is complete. I have decided to give up on ESXi and ddrescue, started the VM from the old datastore and will soon make a switch to another virtual and manually copy the data (rsync or cp).

Reply
0 Kudos
continuum
Immortal
Immortal

Before you really decide to give up I would like to have a closer look.
Feel free to call me via skype: sanbarrow
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos