I thought I would bounce this off the community since our Network guy and even support have been kind of flat on this...
We upgraded from 1GBe network switches to 10GBe switches earlier this year. In my network configurations on my VMWare hosts, I removed all the 1GB connections, replacing them with the 10GB connections. The Management kernel and the VMotion kernel all got the same thing. GB removed, 10GB added. No subnet changes, no IP changes, and no problems for quite a while. Fast forward to last month, my desktop migrations start taking a very unusual amount of time. From 80 desktops migrated in under 60 seconds to 4 - 5 hours to migrate one. VMWare says it was my network scheme since we used a 172.x.x.x for the Management and Desktop switches, and a 192.168.x.x for VMotion. OK... Switched my VMotion stack to a 172.x.x.x that we had open, and I am still having the same issue with migrations. Here's where it gets weird. Trying anything we could, the network guy saw there was a firmware update for our 10GB switches, so he applies it to one and downs the switch for a reboot. The moment he does - WHAM! - all the migrations that were queued finish in record time. We down the second switch, but don't power it right back up, I put a host with ~100 desktops into maintenance mode and it takes ~80 seconds to migrate all objects and replicas. Everything is in one vSwitch, with one channel of the NIC going to physical Switch A and the other channel to physical Switch B. VMotion's kernel is set to use channel A as it's NIC, with channel B as a standby. Management uses both channels of the NIC, just as the desktop vswitch does. The physical switches are not vlan'ed, they're flat back to our core. This is how we had it with 1GB and never had any issues.
So my question is, is there something we're doing wrong with 1GBe vs 10GBe in the virtual switches?