I have a question about networking in ESX 3.5 Update 4 (fully patched).
I have 4 ESX hosts all configured exactly the same. The each have 10 (1 gb) nics, 2 on-board and 4 (2 port) PCI - E adapters. I have used one nic as a connection to the DMZ on its own vSwitch since the DMZ is on a seperate Cisco switch. That leaves me 9. They are layed out as follows:
We have created an 8 nic port group on the Cisco side spanning two Cisco 6500's for redundancy and performance. As you can see in the picture (black box is to protect the VM names ) I have added all 8 into 1 vSwitch and am using VLAN tagging for the respective networks. We are using Route based on IP hash for load balancing and it seems to be working extremely well. Our configured VMWare port groups are SPY - used to capture inline traffic between esx hosts with wireshark with permiscous mode on (not default) , NETWORK - for the default VLAN 1, SERVICE CONSOLE, NETAPP (ip storage), and VMOTION. The NETAPP and VMOTION VLANS both do not route on the Cisco side for security and better traffic performance.
All of this seems really cool and the performance is amazing, I mean heck it is an 8 GB pipe theoretically... The problem we had today was with VMOTION. Normal migration works great and is super fast. What used to take 5 or 6 minutes with one pnic in it's own vSwitch now takes less than a minute. Today, however, we were moving around some VM's and applied the DRS recommendations which moved 7 VM's at once. It did not queue them, it proceeded to move them all at once. Looking at Infinistream on the networking side, we saw 800 mb/s going up to the switches between 3 ESX hosts. Well to make a long story short, communication with the ESX hosts went away for about 10 minutes which caused a large service interruption.
My questions are as follows:
Best Practices say isolate VMOTION on physical nics or with VLANS. Well, we did VLANS so, can I use Traffic Shaping to minimize how much traffic VMotion can use outbound? Has anyone tried this??
If I isolate VMotion traffic physically that means I will have to take 4 nics out of the port channel (1,2 4, or 😎 work best. Should I use two for NetApp, and the other two for VMotion and then use the other 4 for SC, and Network (SPY doesn't really matter at this point)? This seems the most logical at this point but I hate breaking up 8 nics if I don't have to...
The last option is to go 10gb nics (2 of them per ESX host) and then use 4 1gb nics for VMotion. We are probably headed this way but it is a huge investment.
Any thoughts, ideas or experienced insight is welcome.
I would try to seperate VMotion to a seperate vSwitch with a dedicated pNIC uplink. Maybe do the same thing for the service console. Having them all together on one vswitch with all the uplinks is not really the best thing to do - vlan or not. However, you can configure traffic shaping on the vswitch properties.
If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".