I've been troubleshooting a bunch of servers that have a single 10GB connection each, as when moving VMs around using cold migration, the transfer speeds are between 400-500mbit. I've got SSDs in RAID-0 for testing on all these hosts that can easily do 12GB/sec (1000MB+) reads/writes so I don't have storage as a bottleneck. Using vMotion I can get the VMs moved at around 6-7Gbit, but when using cold migration, it doesn't go further than 400-500mbit, even though it is going to the same storage, same network and same physical wire/switch/NIC. I've tested across 5 different hosts, ranging from Dell R515, to R710 and R720XD, all of them with decent RAID controllers. For some reason it seems ESXi artificiality limits the network speed for cold migration (as for testing I've got one NIC on each server and they are for both management and vMotion), as the graphs never spike up or down, it is very much flat.
I checked alreayd the virtual switches and nothing is using limits. I don't know where else to look at. Also I tested SMB traffic in between VMs hosted in different hosts, connected physically by the 10GB network and I can get 6-7Gbit on file transfers, so it mean it works for everything, but cold migration. I had this problem in the past where out of nowwhere a 1GB connection would not go faster then 300mbit for example, no matter what you do. Then out of now where it would reach 1GB speeds. No reason why!
I'm using an Intel x540-t1 PCI-e card on each server.
Server have 128GB of RAM, all fully updates firmware wise, running ESXi 5.5 build 2068190.
I've also tested Veeam using the quick migration force mode, which I suppose is similar to the fastscp client in the past. Pretty much same result, around 500mbit.
Any suggestions how to speed up cold migration?
I've updated the NIC drivers to 3.21.5, no difference at all. Still at 400-500mbit the cold migration, it doesn;t matter which host it is on, and storage is not the bottle neck as I have a bunch of very fast SSDs installed. If I do move the VM from one set of SSDs to another set, all in the same host, I get 1100MB+, but this is simply a datastore migration inside the same host, not using the network. This clearly shows it is not a storage bottleneck.
Got some more numbers, same interface is used for vMotion and cold migration (management network and vMotion on same network and NIC)
Moving VMs from one host to another, first cold migration, second vMotion, moving from one host to another host, no shared storage. It moves from one data store to the other.
Check the graphs attached
We having been experiencing the same exact issue since we moved to 10Gb Nics & Switches last year. vMotion & VM to VM data transfers run close to 10Gb; cold migration or anything that uses the management network is limited to 1Gb. All settings are correct.
Anyone with any ideas?
I'm investigating the drivers, testing many different version. I may try ESXi6 next week but again, a lot of working testing something that seems so stupidly wrong by VMware.
VMware gets the open source driver from Intel and modifies it, so Intel can;t give any support.
cold migration for me doesn't even get close to 1gbit, always around 600mbit, very very very frustrating!!!