VMotion migrations failes - VMware Technology Network VMTN

VMware Cloud Community

HI! I have problems with migrations.

Almost every migration from ESX2 or ESX3 to ESX1 failes (at 10%).

Migrations to ESX2 or ESX3 works sometimes.

I have 3 ESX, version 3.5.0 (153875), on HP Proliant DL385 G1 and storage on an MSA1000 with fibre channels.

VMware VirtualCenter version 2.5.0 (147633)

ESXCFG-VSWITCH -l on source ESX3:

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 32 5 32 1500 vmnic9,vmnic0

PortGroup Name VLAN ID Used Ports Uplinks

Service Console 0 1 vmnic0,vmnic9

Vmotion 0 1 vmnic0,vmnic9

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch2 64 8 64 1500 vmnic8,vmnic7,vmnic6

PortGroup Name VLAN ID Used Ports Uplinks

VMNetwork 0 4 vmnic6,vmnic7,vmnic8

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch1 64 2 64 1500 vmnic1

PortGroup Name VLAN ID Used Ports Uplinks

DMZ 0 0 vmnic1

ESXCFG-VSWITCH -l on destination ESX1:

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 32 5 32 1500 vmnic5,vmnic0

PortGroup Name VLAN ID Used Ports Uplinks

Service Console 0 1 vmnic0,vmnic5

Vmotion 0 1 vmnic0,vmnic5

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch1 64 8 64 1500 vmnic4,vmnic3,vmnic2

PortGroup Name VLAN ID Used Ports Uplinks

VMNetwork 0 4 vmnic2,vmnic3,vmnic4

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch2 64 4 64 1500 vmnic1

PortGroup Name VLAN ID Used Ports Uplinks

DMZ 0 2 vmnic1

ESXCFG-VSWIF -l on source ESX3:

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.0.0.13 255.255.255.0 10.0.0.255 true false

ESXCFG-VSWIF -l on destination ESX1:

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.0.0.11 255.255.255.0 10.0.0.255 true false

Logfiles on source EXS3 (migrate from):

vmware.log (two logfiles):

Sep 22 10:26:01.815: vmx| VMXVmdb_LoadRawConfig: Loading raw config

Sep 22 10:26:01.877: vmx| VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_7b2/op/=from

Sep 22 10:26:01.877: vmx| VmxMigrateGetParam: srcIp=0xa010103 dstIp=0xa010101 mid=4742658671ed3 uuid=33373335-3435-4742-3835-33374a573732 priority=high

Sep 22 10:26:01.877: vmx| Received migrate 'from' request for mid id 1253607947574995, src ip <10.1.1.3>.

Sep 22 10:26:01.877: vmx| MigrateSetInfo: state=7 srcIp=<10.1.1.3> dstIp=<10.1.1.1> mid=1253607947574995 uuid=33373335-3435-4742-3835-33374a573732 priority=high

Sep 22 10:26:01.877: vmx| MigrateStateUpdate: Transitioning from state 0 to 7.

Sep 22 10:26:01.877: vmx| Migrate: Overriding message callbacks with migration handlers.

Sep 22 10:26:01.878: vmx| PowerOn

Sep 22 10:26:01.886: vmx| VMXVmdb_LoadRawConfig: Loading raw config

Sep 22 10:26:01.895: vmx| UNAME Linux kltv1s.klt.se 2.4.21-57.ELvmnix #1 Fri Mar 13 14:33:50 PDT 2009 i686 (uwglibc version 6)

Sep 22 10:26:01.895: vmx| DICT --- USER PREFERENCES

...

Sep 22 10:26:01.896: vmx| DICT --- GLOBAL SETTINGS

Sep 22 10:26:01.995: vmx| hostCpuFeatures = 0x126000fe

Sep 22 10:26:01.995: vmx| hostNumPerfCounters = 4

Sep 22 10:26:02.016: vmx| Resuming migrated virtual machine with 4096 MB of memory.

Sep 22 10:26:02.016: vmx| VMMon_CreateVM: vmmon.numVCPUs=2

Sep 22 10:26:02.018: vmx| Swap file path: '/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-1fad7ad4.vswp.67934'

Sep 22 10:26:02.041: vmx| Using temporary swap file '/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-1fad7ad4.vswp.67934'

Sep 22 10:26:02.041: vmx| VMXVmdb_LoadRawConfig: Loading raw config

Sep 22 10:26:03.952: vmx| vmm32-modules:

Sep 22 10:26:03.952: vmx| KHZEstimate 2205000

Sep 22 10:26:03.952: vmx| MHZEstimate 2205

Sep 22 10:26:03.952: vmx| NumVCPUs 2

Sep 22 10:26:03.953: vmx| UUID: location-UUID is 56 4d 88 d5 aa 08 58 e4-a7 51 af 04 ff 6d ad f1

Sep 22 10:26:03.953: vmx| UUID: canonical path is /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Sep 22 10:26:03.953: vmx| UUID: location-UUID is 56 4d 88 d5 aa 08 58 e4-a7 51 af 04 ff 6d ad f1

Sep 22 10:26:03.953: vmx| UUID: Writing uuid.location 56 4d 88 d5 aa 08 58 e4-a7 51 af 04 ff 6d ad f1

Sep 22 10:26:03.963: vmx| WORKER: Creating new group with numThreads=2 (2)

Sep 22 10:26:03.965: vmx| MStat: Creating Stat vm.uptime

Sep 22 10:26:03.965: vmx| MStat: Creating Stat vm.suspendTime

Sep 22 10:26:03.965: vmx| MStat: Creating Stat vm.powerOnTimeStamp

Sep 22 10:26:03.967: vmx| MigrateWaitForData: waiting for data.

Sep 22 10:26:03.967: vmx| MigrateStateUpdate: Transitioning from state 7 to 8.

.

Sep 22 10:26:04.212: vmx| VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_334/op/=start

Sep 22 10:26:04.212: vmx| VmxMigrateGetStartParam: mid=4742658671ed3 dstwid=2514

Sep 22 10:26:04.212: vmx| Received migrate 'start' request for mig id 1253607947574995, dest world id 2514.

Sep 22 10:26:04.213: vmx| MigrateStateUpdate: Transitioning from state 1 to 2.

Sep 22 10:26:24.218: vcpu-0| MigrateStatusFailure: switching to new log file.

Sep 22 10:26:24.220: vcpu-0| MigrateStatusFailure: Now in new log file.

Sep 22 10:26:24.231: vcpu-0| MigrateStatusFailure: Migration failed while copying data. Timeout.

Sep 22 10:26:24.231: vcpu-0|

Sep 22 10:26:24.231: vcpu-0| MigrateSetInfo: state=5 srcIp=<0.0.0.0> dstIp=<0.0.0.0> mid=0 uuid=(null) priority=(null)

Sep 22 10:26:24.231: vcpu-0| MigrateStateUpdate: Transitioning from state 2 to 5.

Sep 22 10:26:24.231: vcpu-0| Migrate_ClearDoneState: cleared state. State was 5.

Sep 22 10:26:24.231: vcpu-0| MigrateStateUpdate: Transitioning from state 5 to 0.

Sep 22 10:26:24.231: vcpu-0| Msg_Post: Error

Sep 22 10:26:24.231: vcpu-0| http://msg.checkpoint.precopyfailure Migration to host <10.1.1.1> failed with error Timeout (0xbad0020)

.

Sep 22 10:27:03.977: vmx| MigrateWaitForData: Waited for 60.01 seconds.

Sep 22 10:27:03.977: vmx| MigrateWaitForData: timed out. Migration has failed

Sep 22 10:27:03.977: vmx| MigrateStatusFailure: Timed out waiting for migration data.

Sep 22 10:27:03.977: vmx| MigrateSetInfo: state=11 srcIp=<0.0.0.0> dstIp=<0.0.0.0> mid=0 uuid=(null) priority=(null)

Sep 22 10:27:03.977: vmx| MigrateStateUpdate: Transitioning from state 8 to 11.

Sep 22 10:27:03.977: vmx| Migrate_ClearDoneState: cleared state. State was 11.

Sep 22 10:27:03.977: vmx| MigrateStateUpdate: Transitioning from state 11 to 0.

Sep 22 10:27:03.978: vmx| Module Migrate power on failed.

Sep 22 10:27:03.978: vmx| VMX_PowerOn: ModuleTable_PowerOn = 0

Sep 22 10:27:03.978: vmx| WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0

Sep 22 10:27:05.145: vmx| vmdbPipe_Streams Couldn't read: OVL_STATUS_EOF

Sep 22 10:27:05.145: vmx| VMX idle exit

Sep 22 10:27:05.158: vmx| Flushing VMX VMDB connections

Sep 22 10:27:05.166: vmx| IPC_exit: disconnecting all threads

Sep 22 10:27:05.166: vmx| VMX exit (0).

Sep 22 10:27:05.166: vmx| VMX has left the building: 0.

vmkernel:

Sep 22 10:26:04 kltv3s vmkernel: 8:03:42:44.290 cpu1:1637)Migrate: vm 1638: 7383: Setting migration info ts = 1253607947574995, src ip = <10.1.1.3> dest ip = <10.1.1.1> Dest wid = 2514 using SHARED swap

Sep 22 10:26:04 kltv3s vmkernel: 8:03:42:44.290 cpu1:1637)World: vm 1700: 901: Starting world migSendHelper-1638 with flags 1

Sep 22 10:26:04 kltv3s vmkernel: 8:03:42:44.291 cpu1:1637)World: vm 1701: 901: Starting world migRecvHelper-1638 with flags 1

Sep 22 10:26:24 kltv3s vmkernel: 8:03:43:04.295 cpu1:1700)WARNING: MigrateNet: 323: 1253607947574995: 2-0xa021b98:Received only 0 of 68 bytes: Timeout

Sep 22 10:26:24 kltv3s vmkernel: 8:03:43:04.295 cpu1:1700)WARNING: Migrate: 6898: 1253607947574995: Failed to send PRECOPY_START msg: Timeout (0xbad0020)

Sep 22 10:26:24 kltv3s vmkernel: 8:03:43:04.295 cpu1:1700)WARNING: Migrate: 1243: 1253607947574995: Failed: Timeout (0xbad0020) @0x9efd9f

hostd.log:

PrepareSource , VM = '400'

State Transition (VM_STATE_ON -> VM_STATE_EMIGRATING)

VMotionPrepare (1253607947574995): Sending 'to' srcIp=10.1.1.3 dstIp=10.1.1.1

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

InitiateSource , WID = 2514

GetWid: returning 1638

VMotionInitiateSrc (1253607947574995): wid=2514

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Question info: Migration to host <10.1.1.1> failed with error Timeout (0xbad0020)

, Id: 0 : Type : 2, Default: 0, Number of options: 1

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

ResolveCb: VMX reports gone = false

ResolveCb: Failed with fault: (vmodl.fault.SystemError) {

dynamicType = <unset>,

reason = "Migration failed while copying data. Timeout.

",

msg = ""

}

State Transition (VM_STATE_EMIGRATING -> VM_STATE_ON)

VMotion cleanup completed

Received a duplicate transition from foundry: 1

Failed to find activation record, event user unknown.

Event 550 : Message on KLTX1S on kltv3s in ha-datacenter: Migration to host <10.1.1.1> failed with error Timeout (0xbad0020)

Logfiles on destination EXS1 (migrate to):

vmkernel.log:

Sep 22 10:26:01 kltv1s vmkernel: 11:15:42:28.488 cpu3:1036)World: vm 2513: 901: Starting world vmware-vmx with flags 4

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.363 cpu0:2513)World: vm 2514: 901: Starting world vmm0:KLTX1S with flags 8

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.363 cpu0:2513)Sched: vm 2514: 5333: adding 'vmm0:KLTX1S': group 'host/user': cpu: shares=-3 min=0 minLimit=-1 max=-1

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.363 cpu0:2513)Sched: vm 2514: 5352: renamed group 441 to vm.2513

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.363 cpu0:2513)Sched: vm 2514: 5366: moved group 441 to be under group 4

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.384 cpu3:2513)Swap: vm 2514: 2169: extending swap to 4194304 KB

Sep 22 10:26:02 kltv1s vmkernel: 11:15:42:29.385 cpu1:2513)World: vm 2515: 901: Starting world vmm1:KLTX1S with flags 8

Sep 22 10:26:03 kltv1s vmkernel: 11:15:42:31.311 cpu3:2513)Migrate: vm 2514: 7383: Setting migration info ts = 1253607947574995, src ip = <10.1.1.3> dest ip = <0.0.0.0> Dest wid = -1 using SHARED swap

Sep 22 10:26:03 kltv1s vmkernel: 11:15:42:31.311 cpu3:2513)World: vm 2516: 901: Starting world migSendHelper-2514 with flags 1

Sep 22 10:26:03 kltv1s vmkernel: 11:15:42:31.311 cpu3:2513)World: vm 2517: 901: Starting world migRecvHelper-2514 with flags 1

Sep 22 10:26:04 kltv1s vmkernel: 11:15:42:31.543 cpu0:1072)MigrateNet: vm 1072: 854: Accepted connection from <10.1.1.3>

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.321 cpu0:2513)WARNING: Migrate: 1346: 1253607947574995: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.321 cpu0:2513)WARNING: Migrate: 1243: 1253607947574995: Failed: Migration determined a failure by the VMX (0xbad0091) @0x9f19b5

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)Sched: vm 2514: 1031: name='vmm0:KLTX1S'

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)CpuSched: vm 2514: 13868: zombified unscheduled world: runState=NEW

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)World: vm 2514: 2489: deathPending set; world not running, scheduling reap

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)Sched: vm 2515: 1031: name='vmm1:KLTX1S'

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)CpuSched: vm 2515: 13868: zombified unscheduled world: runState=NEW

Sep 22 10:27:03 kltv1s vmkernel: 11:15:43:31.322 cpu0:2513)World: vm 2515: 2489: deathPending set; world not running, scheduling reap

Sep 22 10:27:19 kltv1s vmkernel: 11:15:43:46.540 cpu2:2516)WARNING: MigrateNet: 398: 1253607947574995: Connect to <10.1.1.3>:8000 failed: Timeout

hostd.log:

Initiate: Waiting for WID

VMHS: Exec()'ing /usr/lib/vmware/bin/vmkload_app, /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Established a connection. Killing intermediate child: 29971

Mounting virtual machine paths on connection: /db/connection/#7b3/, /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Mount VM completion for vm: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Mount VM Complete: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx, Return code: OK

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

GetWid: returning 2514

Initiate: Got WID 2514

Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-150028985

Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-150028985

Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-150028986

Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-150028986

Disconnect check in progress: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Unmounting the vm: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Unmounting VM complete: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

Mount state values have changed: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

ResolveCb: VMX reports gone = true

ResolveCb: Failed with fault: (vim.fault.Timedout) {

dynamicType = <unset>,

msg = ""

}

State Transition (VM_STATE_IMMIGRATING -> VM_STATE_OFF)

Reloading config state: /vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S.vmx

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : open successful (23) size = 21474836480, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : closed.

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_1-flat.vmdk" : open successful (23) size = 161061273600, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_1-flat.vmdk" : closed.

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_2-flat.vmdk" : open successful (23) size = 26843545600, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_2-flat.vmdk" : closed.

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_3-flat.vmdk" : open successful (23) size = 177167400960, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S_3-flat.vmdk" : closed.

DISKLIB-VMFS : "/vmfs/volumes/49cb7b64-6390ace9-b15e-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : open successful (23) size = 107374182400, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49cb7b64-6390ace9-b15e-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : closed.

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : open successful (17) size = 21474836480, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/49d9e1e5-0f9df055-2eab-0019bb2ff5b4/KLTX1S/KLTX1S-flat.vmdk" : closed.

State Transition (VM_STATE_OFF -> VM_STATE_UNREGISTERING)

Destroying tools backup agent...

Failed to find activation record, event user unknown.

Event 1418 : Removed KLTX1S on kltv1s.klt.se from ha-datacenter

State Transition (VM_STATE_UNREGISTERING -> VM_STATE_GONE)

VMotion cleanup and unregistration completed

GetPropertyProvider failed for 3120

Virtual machine object cleanup

CompleteDestination

vpxa.log:

Generated list of transforming operations ...

oldOperations != new operations, trying with new operation set...

Trees are identical, nothing to do

No syncs pending, exiting loop...

Received callback in WaitForUpdatesDone

Applying updates from 85402 to 85403 (at 85402)

(1253607947574995) failed tracking VMotion progress at destination (vim.fault.Timedout)

RecordOp 2: info.state, task-63301

TriggerProcessGUReqs: Session 523e7c09-2387-fa43-e575-1e9889d0f985

RecordOp 2: info.cancelable, task-63301

RecordOp 2: info.error, task-63301

-- FINISH task-63301 -- -- vim.host.VMotionManager.initiateDestination:tracking

-- ERROR task-63301 -- -- vim.host.VMotionManager.initiateDestination:tracking: vim.fault.Timedout:

(vim.fault.Timedout) {

dynamicType = <unset>,

msg = "Operation timed out."

}

Can anybody see whats the problem? /Åke

10 Replies

a easy test to do is go at first host > configuration > advanced > migrate > migrate.enabled set 0, OK! return to 1 and OK!

do the same in the other

post here result.

*If you found this information useful, please consider awarding points for "Correct" or "Helpful"*

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado

Can the ESX ping each other from console?

Also try an vmkping!

MCP, VCP3 , VCP4

you can validade configuration of portgroup of vmotion

telnet IP_VMOTION:8000

*If you found this information useful, please consider awarding points for "Correct" or "Helpful"*

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado

Thanks, but it didnt help.

I have done this for all the three ESX.

I have also tried to disable VMware HA and VMware DRS and enabled them again with same result!

Ping is OK.

I have tried vmkping between all ESX many times and it have worked, but when I tried now (from ESX1 to ESX3) i lost 2 of 3 packets!

# vmkping 10.1.1.3

PING 10.1.1.3 (10.1.1.3): 56 data bytes

64 bytes from 10.1.1.3: icmp_seq=0 ttl=64 time=0.469 ms

--- 10.1.1.3 ping statistics ---

3 packets transmitted, 1 packets received, 66% packet loss

round-trip min/avg/max = 0.469/0.469/0.469 ms

When I tried again with vmkping worked...?!

# vmkping 10.1.1.3

PING 10.1.1.3 (10.1.1.3): 56 data bytes

64 bytes from 10.1.1.3: icmp_seq=0 ttl=64 time=0.185 ms

64 bytes from 10.1.1.3: icmp_seq=1 ttl=64 time=0.212 ms

64 bytes from 10.1.1.3: icmp_seq=2 ttl=64 time=0.249 ms

--- 10.1.1.3 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.185/0.215/0.249 ms

Check VLAN ID of portgroup

TEST 1

CLICK AT FIRST HOST > CONFIGURATION > ADVANCED SETTINGS > MIGRAGE > MIGRATE.ENABLED SET TO 0 > OK AND CHANGE TO 1 AGAIN AND OK.

DO SAME AT ALL HOSTS

*If you found this information useful, please consider awarding points for "Correct" or "Helpful"*

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado

VLAN ID is "NONE" on all ESX.

I have this settings for two years!

The migration-problemens starts for three weeks ago.

MIGRATE.ENABLED -> 0 -> 1 on all ESX doesnt do the trick!

From where should I run a telnet-session to the Vmotion-net. I don't run VLAN?

/Åke

Try to move the vmkernel(vmotion) Portgroups on another vswitch for testing!

MCP, VCP3 , VCP4

Akek, download one of the powershell scripts to check cluster health that have been posted on the community and use it to verify that your Hosts are actually seeing the same port group names and stores (These names are case sensistive!)

If you have any problems, remedy them ,. .then try vMotion again.

this one should do the trick:

http://communities.vmware.com/blogs/virtuallysi/2009/04/02/esx-healthcheck-script-winner

One day I will virtualise myself . . .

Hi

Now i have run OpsCheck from and healthcheck.ps1 . None of them report any errors.

Yesterday i saw another interesting thing:

Just when the migration stops at 10% i tried to vmkping from ESX1 to ESX2 and ESX3 in two Putty-sessions and they hang (or lost all packets).

I also run vmkping -c 10000 from ESX2 and ESX3 to ESX1 at the same time and they got answers (no packets loss), but the answertime was very slow, about 2-3 ms! (normally is 0.2 ms in Vmotion-network)

The migration failed and 10-15 minutes after that, my two sessions with vmkping from ESX1 started to got answers from ESX2 and ESX3!

Service console-network and Vmotion-networks are connected to a HP Procurve 2824 and the switch report an event "Lost connection to multiple devices on port: 9".

Port 9 is one of two ports connected to ESX1 for Service console-network and Vmotion-network.

Do i have a nic-problem, HP switch-problem or perhaps cable-problem?

/Åke

a easy test to do is go at first host > configuration > advanced > migrate > migrate.enabled set 0, OK! return to 1 and OK!
do the same in the other

post here result.

*If you found this information useful, please consider awarding points for "Correct" or "Helpful"*
</div>
I was just experiencing this problem on one of my 5 ESX hosts (the other 4 were fine). I could actually VMotion stuff onto the problem host, but not off of it, so the host was running at about 95% memory usage while the other 4 were at ~40%. I tried MauroBonder's suggestion and it resolved the issue. Thanks!