VMware Cloud Community
georgemason
Contributor
Contributor

Very slow Storage vMotion (VSAN migration)

Hi,

I am seeing an issue which I don't fully understand. I'm in the process of planning the last stages of a VSAN migration and need to move some big VMs from hosts with locally attached storage to a VSAN v6.0 environment. The hosts have separate 10G networking for vMotion and I have also made sure that the management VMkernel interfaces also traverse the 10G network, on a different VLAN.

The source host is a DL360 G6 ESXi 5.5 with 900GB SAS disks with the source VM hosted on a RAID5 volume. The destination host is ESX 6.0 U2 with VSAN configured, DL380 Gen9 with HP SSD and 7x 1TB 10k SAS. Both hosts are connected to the same Cisco Nexus 10G switch and both links are confirmed as running at 10G. The source datastore has a block size of 1MB (VMFS v5.60). Jumbo frames are not enabled.

When I try to cold vMotion a machine across, the transfer runs around 400 - 500 Mbit/s, I have read that vMotion can easily saturate networking so am a bit surprised at how little bandwidth is being used.

pastedImage_0.png

The performance monitor output above shows with a large VM being transferred, the speed is pretty constant at 500Mbit/s.

I would appreciate any input on why the speed is so low, and what I can do to try and improve the performance.

EDIT: I note that vMotion will load balance across multiple vmknics - would this be a good approach?

Thanks

0 Kudos
2 Replies
bmidd31
VMware Employee
VMware Employee

Are your VSAN vkernel ports on the 10G network as well?

Also, you may want to look at upgrading those 6.0U2 hosts to 6.5d.  VSAN 6.6 has some significant improvements in performance and features over VSAN 6.2!

0 Kudos
georgemason
Contributor
Contributor

Yes, the VSAN vmk ports are on the 10G network too. The NIC failover order is configured in such a way that vMotion does not conflict with VSAN unless one of the NICs is offline, although to be honest even then I think with our usage scenario the load would still not swamp a 10G card.

The switch is a Cisco Nexus 3K which as far as I understand can switch at wire rate on all 48 ports concurrently so pretty sure that's not the bottleneck!

Your comments about v6.6 are interesting but from what I have read, even 6.2 should be able to manage more throughput than I am seeing when doing vMotion? I just tried removing all snapshots from the source VM and it does seem slightly faster (about 633Mbit/s) but still nowhere near saturating even a 1G connection.

0 Kudos