VMware Cloud Community
theburnout
Contributor
Contributor
Jump to solution

storage vmotion esxi 5.5 to 6.0 not working for some vm

In a test-cluster with 2 nodes with no shared storage (local only) storage vmotion from 5.5 host to 6.0 host is not working.

before the upgrade I moved all vm to one host.

Then I upgraded the host to 6.0 and tried to move all vm to the updated 6.0 host (move host + storage).

From 20 vm 5 were moved to the new host.

All others get always below error.

The Client VM are all vm version 10 (but some of them upgraded from 8 or 9).

ClientOS Windows 2008, 2012 r2 and linux. Some with vmx3, some with e1000.

The machines can be moved when powered off.

After the off-vmotion, storage vmotion from 6 to 5.5 and back 5.5 to 6 is working again.

<code>

Failed to copy one or more of the virtual machine's disks. See the virtual machine's log for more details.

Failed to set up disks on the destination host.

vMotion-Migration [-1408040447:1427943699552189] konnte den Stream-Keepalive nicht lesen: Connection closed by remote host, possibly due to timeout

Fehlgeschlagen beim Warten auf Daten. Fehler 195887167. Connection closed by remote host, possibly due to timeout.

</code>

vm-log:

<code>

2015-04-02T03:01:34.299Z| vmx| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to immigrating for migrate 'from' mid 1427943699552189

2015-04-02T03:01:34.307Z| vmx| I120: MigratePlatformInitMigration:  DiskOp file set to /vmfs/volumes/526d0605-8317b046-f37e-0025906cd27e/test1/test1-diskOp.tmp

2015-04-02T03:01:34.311Z| vmx| A115: ConfigDB: Setting migration.vmxDisabled = "TRUE"

2015-04-02T03:01:34.326Z| vmx| I120: MigrateWaitForData: waiting for data.

2015-04-02T03:01:34.326Z| vmx| I120: MigrateSetState: Transitioning from state 8 to 9.

2015-04-02T03:01:34.467Z| vmx| I120: MigrateRPC_RetrieveMessages: Informed of a new user message, but can't handle messages in state 4.  Leaving the message queued.

2015-04-02T03:01:34.467Z| vmx| I120: MigrateSetState: Transitioning from state 9 to 10.

2015-04-02T03:01:34.468Z| vmx| I120: MigrateShouldPrepareDestination: Remote host doesn't support an explicit step to prepare migrate destination.

2015-04-02T03:01:34.469Z| vmx| I120: MigrateBusMemPrealloc: BusMem preallocation complete.

2015-04-02T03:01:34.469Z| Worker#0| I120: SVMotion_RemoteInitRPC: Completed.

2015-04-02T03:01:34.471Z| Worker#0| W110: SVMotion_DiskSetupRPC: Related disk expected but not found for file: /vmfs/volumes/526d0605-8317b046-f37e-0025906cd27e/test1/test1-ctk.vmdk

2015-04-02T03:01:34.471Z| Worker#0| W110: MigrateRPCHandleRPCWork: RPC DiskSetup callback returned non-success status 4. Failing RPCs.

2015-04-02T03:01:34.493Z| vmx| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to failure for migrate 'from' mid 1427943699552189

2015-04-02T03:01:34.502Z| vmx| I120: MigrateSetStateFinished: type=2 new state=12

2015-04-02T03:01:34.502Z| vmx| I120: MigrateSetState: Transitioning from state 10 to 12.

2015-04-02T03:01:34.502Z| vmx| I120: Migrate: Caching migration error message list:

2015-04-02T03:01:34.502Z| vmx| I120: [msg.migrate.waitdata.platform] Failed waiting for data.  Error bad003f. Connection closed by remote host, possibly due to timeout.

2015-04-02T03:01:34.502Z| vmx| I120: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [ac130201:1427943699552189] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

2015-04-02T03:01:34.502Z| vmx| I120: Migrate: cleaning up migration state.

2015-04-02T03:01:34.502Z| vmx| I120: SVMotion_Cleanup: Cleaning up XvMotion state.

2015-04-02T03:01:34.502Z| vmx| I120: Closing all the disks of the VM.

2015-04-02T03:01:34.504Z| vmx| I120: Migrate: Final status reported through Vigor.

2015-04-02T03:01:34.504Z| vmx| I120: MigrateSetState: Transitioning from state 12 to 0.

2015-04-02T03:01:34.505Z| vmx| I120: Migrate: Final status reported through VMDB.

2015-04-02T03:01:34.505Z| vmx| I120: Module Migrate power on failed.

2015-04-02T03:01:34.505Z| vmx| I120: VMX_PowerOn: ModuleTable_PowerOn = 0

2015-04-02T03:01:34.505Z| vmx| I120: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do

2015-04-02T03:01:34.507Z| vmx| A115: ConfigDB: Setting replay.filename = ""

2015-04-02T03:01:34.507Z| vmx| I120: Vix: [291569 mainDispatch.c:1188]: VMAutomationPowerOff: Powering off.

2015-04-02T03:01:34.507Z| vmx| W110: /vmfs/volumes/526d0605-8317b046-f37e-0025906cd27e/test1/test1.vmx: Cannot remove symlink /var/run/vmware/root_0/1427943694053879_291569/configFile: No such file or directory

2015-04-02T03:01:34.507Z| vmx| I120: WORKER: asyncOps=4 maxActiveOps=1 maxPending=0 maxCompleted=0

2015-04-02T03:01:34.529Z| vmx| I120: Vix: [291569 mainDispatch.c:4292]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0

2015-04-02T03:01:34.529Z| vmx| I120: Vix: [291569 mainDispatch.c:4292]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

2015-04-02T03:01:34.529Z| vmx| I120: Transitioned vmx/execState/val to poweredOff

2015-04-02T03:01:34.529Z| vmx| I120: Vix: [291569 mainDispatch.c:4292]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0

2015-04-02T03:01:34.529Z| vmx| I120: Vix: [291569 mainDispatch.c:4331]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error

2015-04-02T03:01:34.529Z| vmx| I120: Vix: [291569 mainDispatch.c:4292]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

</code>

Any Idea to fix this issue? Maybe this is a bug with vsphere 6.0

Since some machines were working and later vmotion back and forth from 6.0 to 5.5 is working, I believe this is no configuration error.

Reply
0 Kudos
1 Solution

Accepted Solutions
Alistar
Expert
Expert
Jump to solution

Hello there,


it seems that there might be a bug or that the hosts are having trouble negotiating the disk transfer over a network. The Storage vMotions are done through the Management Network (a hard limitation because of the coding) - are you sure that the management networks are not being overloaded or that they are running at desired speeds? Does this also happen with machines that have Chanbe Block Tracking disabled? It seems that this might be having an issue.


Also, are you using the "old" client or the Web client? It is possible that some feature in vSphere 6.0 could have been omitted from the old Thick client.


The errors are highlighted below:

2015-04-02T03:01:34.468Z| vmx| I120: MigrateShouldPrepareDestination: Remote host doesn't support an explicit step to prepare migrate destination.

2015-04-02T03:01:34.471Z| Worker#0| W110: SVMotion_DiskSetupRPC: Related disk expected but not found for file: /vmfs/volumes/526d0605-8317b046-f37e-0025906cd27e/test1/test1-ctk.vmdk

Thanks for the answers in advance and good luck!

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/

View solution in original post

Reply
0 Kudos
5 Replies
Sateesh_vCloud
Jump to solution

Do you notice any specific differences between working VM and not-working one?  Like

VM Tools version

Disk controller

OS versions

------------------------------------------------------------------------- Follow me @ www.vmwareguruz.com Please consider marking this answer "correct" or "helpful" if you found it useful T. Sateesh VCIX-NV, VCAP 5-DCA/DCD,VCP 6-NV,VCP 5 DCV/Cloud/DT, ZCP IBM India Pvt. Ltd
Reply
0 Kudos
theburnout
Contributor
Contributor
Jump to solution

After more testing: it is not true that storage vmotion from 6.0 to 5.5 and then back again 5.5 to 6.0 is working.

I can live-vmotion 6.0 to 5.5. But no move the machine back to 6.0

OS and vmware-tools were on all systems the same (up to date). For the 6.0 to 5.5 test the tools were updated to 6.0

I compared one vm working with non-working, these are the differences:

(both Windows 2012 R2, if not mentioned the entries do not exist on the other vm)

For the not working machine:

floppy0.present = "FALSE"

svga.vramSize = "8388608"

sata0.present = "TRUE"

vhv.enable = "TRUE"

ctkEnabled = "true"

scsi0:0.ctkEnabled

sched.cpu.latencySensitivity = "normal"

vmotion.checkpointSVGASize = "11534336"

For the working machine:

evcCompatibilityMode = "TRUE"

guestCPUID.0 = "0000000d756e65476c65746e49656e69"

guestCPUID.1 = "000206d200010800969822030fabfbff"

guestCPUID.80000001 = "00000000000000000000000128100800"

hostCPUID.0 = "0000000d756e65476c65746e49656e69"

hostCPUID.1 = "000206d70020080017bee3ffbfebfbff"

hostCPUID.80000001 = "0000000000000000000000012c100800"

floppy0.startConnected = "FALSE"

floppy0.clientDevice = "TRUE"

floppy0.fileName = "vmware-null-remote-floppy"

sched.cpu.latencySensitivity = "low"

checkpoint.vmState = ""

Reply
0 Kudos
Alistar
Expert
Expert
Jump to solution

Hello there,


it seems that there might be a bug or that the hosts are having trouble negotiating the disk transfer over a network. The Storage vMotions are done through the Management Network (a hard limitation because of the coding) - are you sure that the management networks are not being overloaded or that they are running at desired speeds? Does this also happen with machines that have Chanbe Block Tracking disabled? It seems that this might be having an issue.


Also, are you using the "old" client or the Web client? It is possible that some feature in vSphere 6.0 could have been omitted from the old Thick client.


The errors are highlighted below:

2015-04-02T03:01:34.468Z| vmx| I120: MigrateShouldPrepareDestination: Remote host doesn't support an explicit step to prepare migrate destination.

2015-04-02T03:01:34.471Z| Worker#0| W110: SVMotion_DiskSetupRPC: Related disk expected but not found for file: /vmfs/volumes/526d0605-8317b046-f37e-0025906cd27e/test1/test1-ctk.vmdk

Thanks for the answers in advance and good luck!

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
Reply
0 Kudos
theburnout
Contributor
Contributor
Jump to solution

Hello,

@Alistar: You are indeed right.

Setting "Change Block Tracking" for the corresponding scsi disk to disabled allowed me to storage vmotion the machines.

So they need to be powered off, reconfigured and powered on. After that vmotion is working.

I believe that this configuration is automatically set by VDP. The machines where vmotion was working weren't backuped by VDP.

Management Network is gigabit and almost idle. There is only traffic when moving the machines. Then it goes up to 1 Gbit (100 MByte/s).

Edit: I was using the web client, since storage vmotion (moving disk and host the same time) is not working with standard client.

Reply
0 Kudos
AFIARoth
Contributor
Contributor
Jump to solution

I found the following solution.

I used the following powerCLI script to disable CBT on the running VM and than migrate it using the following script.

Just Modify the $vCenterServer to match your vCenter name and the $VMName to match the name of the VM you want to move.

when you want to re-enable CBT you can change "$spec.ChangeTrackingEnabled = $false" to $true


cls
#Import VMware PowerCLI Snapin
if ( (Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction SilentlyContinue) -eq $null )
{
Add-PsSnapin VMware.VimAutomation.Core
}

$vCenterServer = "Vcenter-server-name"
$VMName ="VMName"
$mute = Connect-VIServer $vCenterServer -WarningAction SilentlyContinue

$VMs = get-vm -Name "$VMName"

#Create a VM Specification to apply with the desired setting:

$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.ChangeTrackingEnabled = $false

#Apply the specification to each VM, then create and remove a snapshot:
foreach($vm in $VMs){
    $vm.ExtensionData.ReconfigVM($spec)
    $snap=$vm | New-Snapshot -Name 'Disable CBT'
    $snap | Remove-Snapshot -confirm:$false

}

Disconnect-VIServer * -Confirm:$false

Reply
0 Kudos