Solved: Migrating 98TB VM

stray_guru · ‎12-13-2018

Hello all

I'm trying to migrate powered off VM containing two Virtual Disks in sum capacity 98TB. From ESX 6.7 to ESX 6.5U2 by Vcenter 6.7. - Essential Licence. And I'm getting error "An errore occured while comunicating with remote host"

Same while clonning. I will mention that smaller VMs migrate without any problem... Any idea?

continuum · ‎12-13-2018

If the task is: copy 100TB from ESXi 1 to ESXi 2 and you expect problems along the way you first of all need a no-nonsense approach that can be restarted when ever it fails.
Having to restart from the beginning after 90% is done is no allowed option when the whole process can take days and days.
This is what I do ....
start a Linux VM on one of both hosts
boot that VM and connect to source datastore on ESXi 1 and target datastore on ESXi 2 via sshfs.
mount both datastores like this
sshfs -o ro root@source-esxi:/vmfs/volumes/datastore-source/ /vmfs-in

sshfs root@source-esxi/vmfs/volumes/datastore-target/vmfs-out

then in a first batch copy all small files other than the large flat.vmdks with a command like
cp /vmfs-in/directory/* /vmfs-out/directory/
this will just take a few minutes.
To copy the large flat.vmdks I use ddescue - as that offers the option to restart when ever necessary
ddrescue /vmfs-in/directory/name_1-flat.vmdk /vmfs-out/directory/name_1.flat.vmdk /vmfs-out/directory/name_1.flat.log

ddrescue /vmfs-in/directory/name_2-flat.vmdk /vmfs-out/directory/name_2.flat.vmdk /vmfs-out/directory/name_2.flat.log
once both commands are started you can relaunch them when ever the commands gets interrupted - and you can count on having that option!!!
and Yes:
this is ugly
this may not be the fastest option
this may not be the option you learned in your VCP-course
but if your boss asks you "will you manage this until next friday ???"
this is the only way to do it and sleep well during next days 😉
I use this approach in my VMware recovery work whenever I have to copy very large vmdks and need predictable results.
It is ugly, requires extra work but it is predictable and reliable and is the only way that can handle any network problems along the way.
If source or taget ESXi reboots because a powerfailure tomorrow this approach can continue the copy at the place where it was interrupted.
This approach surely is overkill for small VMs - but in your case investing an extra 30 minutes to set this up is a very reasonable decision.
Just make sure that you use the "create log" option with ddrescue !!!

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

dbalcaraz · ‎12-13-2018

vMotion won't work if you surpass the heartbeat timeout between ESXi hosts so, maybe (I don't how did you configure your storage) your network has some latency from a particular ESXi host and hence, denying the vMotion.

Did you check in the vxpd.log in order to see a detailed error of this?

-------------------------------------------------------- "I greet each challenge with expectation"

stray_guru · ‎12-13-2018

this is my vpxd log I just cut only part of it, since log-timestamp "start to completion time" of failed job. Can You help me debuging it?

nestorleal641 · ‎12-13-2018

Check that you do not have a snapshot, you can perform the migration with some software such as Veeam Backups & Replication

Nestor Luis Leal Ibara VMware Certified Professional 6 – Data Center Virtualization +51964392265 | nestorleal641@gmail.com Skype: nestor_leal641

continuum · ‎12-13-2018

If the task is: copy 100TB from ESXi 1 to ESXi 2 and you expect problems along the way you first of all need a no-nonsense approach that can be restarted when ever it fails.
Having to restart from the beginning after 90% is done is no allowed option when the whole process can take days and days.
This is what I do ....
start a Linux VM on one of both hosts
boot that VM and connect to source datastore on ESXi 1 and target datastore on ESXi 2 via sshfs.
mount both datastores like this
sshfs -o ro root@source-esxi:/vmfs/volumes/datastore-source/ /vmfs-in

sshfs root@source-esxi/vmfs/volumes/datastore-target/vmfs-out

then in a first batch copy all small files other than the large flat.vmdks with a command like
cp /vmfs-in/directory/* /vmfs-out/directory/
this will just take a few minutes.
To copy the large flat.vmdks I use ddescue - as that offers the option to restart when ever necessary
ddrescue /vmfs-in/directory/name_1-flat.vmdk /vmfs-out/directory/name_1.flat.vmdk /vmfs-out/directory/name_1.flat.log

ddrescue /vmfs-in/directory/name_2-flat.vmdk /vmfs-out/directory/name_2.flat.vmdk /vmfs-out/directory/name_2.flat.log
once both commands are started you can relaunch them when ever the commands gets interrupted - and you can count on having that option!!!
and Yes:
this is ugly
this may not be the fastest option
this may not be the option you learned in your VCP-course
but if your boss asks you "will you manage this until next friday ???"
this is the only way to do it and sleep well during next days 😉
I use this approach in my VMware recovery work whenever I have to copy very large vmdks and need predictable results.
It is ugly, requires extra work but it is predictable and reliable and is the only way that can handle any network problems along the way.
If source or taget ESXi reboots because a powerfailure tomorrow this approach can continue the copy at the place where it was interrupted.
This approach surely is overkill for small VMs - but in your case investing an extra 30 minutes to set this up is a very reasonable decision.
Just make sure that you use the "create log" option with ddrescue !!!

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

stray_guru · ‎12-14-2018

hello

yes this is perfect solution I was thinking of. But before this final solution i checked Veeam migration option which perfectly bypass the problem. Maybe not perfectly because much slower than ekspected ;( but til now it's copying. However I'm still curious why it won't work in vcenter-vmotion

stray_guru · ‎12-14-2018

no snapshots... yes I switched to veeam but it's slower solution and I'm still curious why it won't work

dbalcaraz · ‎12-14-2018

I was thinking... are you trying to move only from a compute perspective or also the storage (different Datastores) ?

About the vxpd.log, I only found this error: -- ERROR lro-29078 -- -- VmprovWorkflow: vmodl.fault.HostCommunication:

-------------------------------------------------------- "I greet each challenge with expectation"

stray_guru · ‎12-14-2018

I'm moving also storage so simply it's copying 98TB of data

dbalcaraz · ‎12-14-2018

What continuum said is nice.

Another thing you could try is to SDRS disks, instead of doing the whole VM.

-------------------------------------------------------- "I greet each challenge with expectation"

jlaurentino · ‎06-22-2023

The solutions provided seem good.

I have only 7 TB in my ESXI setup and have an offline redundant machine.

Every so often I do the backup by going on a workstation and

mkdir esxi-from

mkdir esxi-to

sshfs main-machine esxi-from

sshfs backup-machine esxi-to

sudo rsync -aAXHv esxi-from/ esxi-to/

Any suggestions to improve this would be greatly appreciated.

Thanks,

Laurentino

All

Migrating 98TB VM