I have just spent some time getting VLAN trunking working on ESX 2.5.3 in our lab environment. Once I had this working in the version of ESX that I am most comfortable in I figured due a new install of ESX 3 and set up VLAN trunking.
I am however having connectivity issues in ESX 3.
We have been testing have 2 NICs bonded, each one connected to it's own switch to allow some redundancy.
I have 2 VM's on 2 different VLANs, I run a ping to their gateways and the other VM.
When I disconnect one of the network cables that have the trunking set up. I initially loose one ping, then a few seconds later the VM's stop pinging altogther, both their gateways and each other. They do not recover until I replace the cable.
Obviously we want to be able to survive a hardware fault in either a NIC or a switch if we are using trunking.
There are alot more configuration options in ESX 3 so I suspect that this is a config step I have missed... currently I have Load Balancing set to Route based on source MAC hash adn Network Failover Detection set to Beacon Probing. The reason I chose these is that my 2.5.3 server that had working trunking was set to out-mac and had beaconing enabled.
Any help will be greatly appreciated.
Thank you in advance.
I had issues getting it to work with ESX 3, but after the Cisco Config was done, i just played with the IP and MAC Hash.. They both worked for me, although ended up using the IP Hash...
That is the strange thing... these is exactly the same server I had working trunking with 2.5.3. There have been no changes to the network switch side.
Use route based on IP Hash.
What brand of physical switches do you use? I saw some weird things with HP Procurve switches and nic teaming.
bk
I ran into a very odd observation with trunking and esx3 today. I could not get the trunk vlans to allow a guest to do anything on the network. On the vswitch one of the nics was showing a odd network 0.0.0.xx
xx = I cannot remember the octets.
I shutdown the port on the pswitch and all of a sudden network connectivity, renabled the pswitch port and it still works. Only have 2 vlans currently on the system.
The switches are cisco 3750's stacked with 2 etherchannels into the core. The vswitch is 2 discrete connections not etherchanneld but one port from the esx host into each switch.
Have not rebooted either the esx server or pswitches to see if the behaviour reoccurs.
I am using HP Procurves and the only problem I have is when I reboot one of my esx servers the ports fo into blocked mode. If I enable them they work fine again until I reboot the server again.
We are using Cisco switches, would have to check with the network team for the exact model. I know this config worked with 2.5.3 and VLAN Trunking so I was hoping it would just be a case of configuring ESX 3 with the same VLAN's...
I now have this config:
Load Balancing - Route based on ip hash
Network Failover Detection - Beacon Probing
Notify Switches - No
Rolling Failover - No
Failover Order - Not configured
Pull NIC1 - All pings fail - never recovers
Replace NIC1 - pings recover after approx 45-60 seconds
PUll NIC2 - No Loss of ping
Replace NIC2 - No loss of ping
So with the IP hash it is better but not perfect
I have the settings set at the vSwitch level with no over-ride set for each port group.
When the cable is removed from vmnic0 I get a complete failure. vmnic1 can be removed and replaced with no loss of ping.
Try without beacon probing.....
I have tried to the following...
Load Balancing - Route based on IP hash
Network Failover Detection - Link Status Only
Notify Switches - Yes
Rolling Failover - No
Pulled the first cable - No loss of ping
Replaced the first cable - lost pings to VM's for approx 20-30 seconds
Pulled the second cable - no loss of ping
Replaced the second cable - lost pings to Gateways for 20-30 seconds
This is similar to what I was seeing in ESX 2.5.3 prior to enabling beaconing... but beaconing does not seem to improve the situation in ESX 3.
We are using two 6509 Cisco switches with CatOS. Trunking is set up on both switches. Then the ESX server has a connection to each switch. I have set a vSwitch with both NICs included and then set up the port groups for the required VLANs. We want to be able to use two physical switches to allow for hardware redundancy.
Is ether channel setup for the switch? You do need IP hash on the esx side, and make sure portfast is turned on on the switch side
I know this is quite a late response, but my 2c's:
Beacon probing is broken in many situations. The exact situations are not well documented, but the fact that it's broken and that it will be "fixed" by 3.0.2 has been stated by VMware.
The 20-30 second timeouts you're seeing are most likely due to Spanning Tree. With Cisco switches, spanning tree blocking/learning is applied to trunked ports even if you have portfast enabled.
You need to set the "portfast trunk" option on the necessary ports. On CatOS, it would appear that that command is:
\[Quote]set spantree portfast 5/1 enable trunk[/Quote]
...substituting 5/1 for your particular blade and port number, of course.
In my particular environment (very old Nortel switches), they had no way of doing "portfast trunk", so we opted for active/standby adapters in the vswitches in stead. This also solved the "my VMs go down when I reconnect the failed cable" problem as it's unlikely we'll have another cable failure within 30 seconds of the originally failed cable being reconnected.
While this theoretically limits bandwidth, we're not even close to using a full Gigabit link yet, so we're ok on that front for now.