Greetings,
We've recently switched from 6.7 to 7.0.3, 19482537 and we had never had any similar problems with vMotion before. When a network failure occurs and it affects ESXi hosts, they go back to normal as soon as Cisco ports or the entire network environment re-balances.
Yesterday we had a problem to vMotion several VMs onto two ESXi hosts after such network incidents. I looked through vpxa, hostd and vmkernel logs and found:
Based on some Cisco log entries I decided to replace SFP modules in one ESXi host (also replaced the corresponding module in Cisco) - still, was not able to vMotion any VMs.
The only workaround seems to be a reboot - after the reboot, problems with vMotion are gone. It means that there are no configuration problems (MTU mismatch, etc.). Not a single VM stucks at 20% again while moving it onto another host. At this moment, it's the only workaround - maybe there's a bug in 7.0.3?
Regards,