Travis_83
Enthusiast
Enthusiast

Live vMotion to 'single' host in cluster extremely slow

Hi guys,

vCentre 7 / vSphere 7 Enterprise Plus / ESX Hosts - VMware ESXi, 6.7.0, 17700523

We have a cluster of 4 hosts in total. I can vMotion between 3 of the hosts without issue, but when I try to vMotion a single Live VM to 1 of the hosts (.41) in this cluster it gets to 20% and then slows down drastically, taking ages to complete (only moving between hosts not storage!). Weirdly I can vMotion a VM 'off' this particular host to another without issue, its only when this host is 'receiving' a single VM. If I power off VM and move it, then its fine - only when moving the VM powered ON is it an issue.

I can vmkping the vMotion network (no latency or drops) to and from this problem host and don't see any issue at this moment in the network configuration, conflicting IPs or anything untoward really. Have rebooted host etc.

Looking at the vmkernal logs, the difference I see on the host when receiving a vMotion, highlighted:

Working host receiving a vMotion request

2021-10-11T15:38:02.124Z cpu20:2458438)Hbr: 3561: Migration start received (worldID=2458439) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
2021-10-11T15:38:02.166Z cpu19:2099511)MigrateNet: vm 2099511: 3263: Accepted connection from <192.168.30.41>
2021-10-11T15:38:02.166Z cpu19:2099511)MigrateNet: vm 2099511: 3351: dataSocket 0x4310e57cf3f0 receive buffer size is 563272
2021-10-11T15:38:02.166Z cpu19:2099511)Migrate: 358: Remote machine is ESX 6.5 or newer.
2021-10-11T15:38:02.167Z cpu15:2458449)MigrateNet: 1751: 5608673866367299248 😧 Successfully bound connection to vmknic vmk3 - '192.168.30.43'
2021-10-11T15:38:02.168Z cpu19:2099511)MigrateNet: vm 2099511: 3263: Accepted connection from <192.168.30.41>
2021-10-11T15:38:02.168Z cpu19:2099511)MigrateNet: vm 2099511: 3351: dataSocket 0x4310e5a5eac0 receive buffer size is 563272
2021-10-11T15:38:02.168Z cpu19:2099511)Migrate: 358: Remote machine is ESX 6.5 or newer.
2021-10-11T15:38:02.168Z cpu19:2099511)VMotionUtil: 5199: 5608673866367299248 😧 Stream connection 1 added.
2021-10-11T15:38:06.689Z cpu8:2097388)<6>qlcnic 0000:27:00.0: vmnic6:qlcnic_netq_alloc_queue_with_attr:1650:Feature RSS needed.
2021-10-11T15:38:10.394Z cpu23:2458450)VMotionRecv: 761: 5608673866367299248 😧 Estimated network bandwidth 220.429 MB/s during pre-copy

Versus what I see on the problem host when it receives a vMotion request:

2021-10-11T10:50:12.609Z cpu36:2109839)Hbr: 3561: Migration start received (worldID=2109840) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
2021-10-11T10:50:12.655Z cpu63:2099517)MigrateNet: vm 2099517: 3263: Accepted connection from <192.168.30.43>
2021-10-11T10:50:12.655Z cpu63:2099517)MigrateNet: vm 2099517: 3351: dataSocket 0x4310ae3e5790 receive buffer size is 563272
2021-10-11T10:50:12.655Z cpu63:2099517)Migrate: 358: Remote machine is ESX 6.5 or newer.
2021-10-11T10:50:12.655Z cpu59:2109850)MigrateNet: 1751: 5608673849097790831 😧 Successfully bound connection to vmknic vmk3 - '192.168.30.41'
2021-10-11T10:50:12.656Z cpu63:2099517)MigrateNet: vm 2099517: 3263: Accepted connection from <192.168.30.43>
2021-10-11T10:50:12.656Z cpu63:2099517)MigrateNet: vm 2099517: 3351: dataSocket 0x4310ae31e700 receive buffer size is 563272
2021-10-11T10:50:12.656Z cpu63:2099517)Migrate: 358: Remote machine is ESX 6.5 or newer.
2021-10-11T10:50:12.656Z cpu63:2099517)VMotionUtil: 5199: 5608673849097790831 😧 Stream connection 1 added.
2021-10-11T10:54:48.632Z cpu8:2098471)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x89 (0x459b41334b80, 2097225) to dev "naa.600c0ff00052be845d94486101000000" on path "vmhba64:C0:T1:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe. Act:EVAL
2021-10-11T10:54:48.632Z cpu8:2098471)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600c0ff00052be845d94486101000000" state in doubt; requested fast path state update...
2021-10-11T10:54:48.632Z cpu12:2097743)ScsiDevice: 4599: Handle REPORTED LUNS CHANGED DATA unit attention
2021-10-11T10:54:48.632Z cpu8:2098471)ScsiDeviceIO: 3448: Cmd(0x459b41334b80) 0x89, CmdSN 0x90c4 from world 2097225 to dev "naa.600c0ff00052be845d94486101000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.
2021-10-11T10:54:48.632Z cpu8:2098471)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x89 (0x459b4d210bc0, 2097225) to dev "naa.600c0ff00052be8450907a6001000000" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe. Act:EVAL
2021-10-11T10:54:48.632Z cpu8:2098471)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600c0ff00052be8450907a6001000000" state in doubt; requested fast path state update...
2021-10-11T10:54:48.632Z cpu8:2098471)ScsiDeviceIO: 3448: Cmd(0x459b4d210bc0) 0x89, CmdSN 0xabf2 from world 2097225 to dev "naa.600c0ff00052be8450907a6001000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0xe.

Any ideas?

Many thanks, Travis

Tags (1)
0 Kudos
1 Reply
Travis_83
Enthusiast
Enthusiast

Issue was resolved on host - swapped out fibre module.

 

Thanks.

0 Kudos