VMware Cloud Community
virtualfish321
Contributor
Contributor

HA Failover Question

My environment:

5 HP DL380 G6 servers, 60gb RAM, 8 network ports

     - 4 onboard NICs

               - vmnic0, vmnic1,vmnic2,vmnic3

               - All NICs go to switch 1

     - PCI card w/4 port NICs

               - vmnic4, vmnic5, vmnic6, vmnic7

               - All NICs go to switch 2

Fiber attached SAN through Emulex HBAs

My network config as follows:

vswitch0: Service Console Only

     Contains vmnic0 and vmnic7

vswitch1: vmotion network, service console 2

     Contains vmnic1 and vmnic 6

vdistributed switch: Production Server Network

     Contains vmnic2 and vmnic5

So, as you can see, I have redundant network connections from each vswitch to each physical switch.  Physical switch layout is:

Cisco 2350 in each server rack (hence switch 1 and switch 2 etc...)

uplink to Cisco 6506 core via 20gb etherchannel from each 2350

one esx server, esx10 is connected to switch 1 and switch 2.

other esx servers, esx01, 02, 03 , and 04 are connected to switches 3, 4, 5 and 6

I recently had one of my 2350 switches go down that connected esx10.  Instead of esx10 just failing to the other NICs on the server, the rest of the cluster saw esx10 as down and performed an HA failover of the guests that were on it, resulting in downtime of those servers as they rebooted on the other esx hosts.

My question, maybe problem, is why didn't the connections failover instead of migrating the guests?  The failure in the switch was in the etherchannel uplink, not the switch itself, so the physical NIC connections were still live.  But shouldn't there be some kind of heartbeat to account for downstream network failures?  I put in all this network redundancy just for failures like this, but it all seems moot when I get downtime anyways from an HA migration with reboot.

Any ideas are greatly appreciated.  Thanks!

Reply
0 Kudos
2 Replies
a_p_
Leadership
Leadership

By default only the link state is tracked. Depending on how many NICs are involved this issue may be avoided by either configuring Beacon Probing on the ESX host or Link State Tracking on the physical switch (see http://kb.vmware.com/kb/1005577)

André

Reply
0 Kudos
a_p_
Leadership
Leadership

Discussion moved from VMware ESX™ 4 to Availability: HA & FT

Reply
0 Kudos