VMware Cloud Community
peter79
Enthusiast
Enthusiast

ESXi host enters isolation mode after management NIC is reconnected

Guys,

I've discovered a very strange problem.

I have 2 ESXi hosts (both running 4.0) in a DRS/HA cluster and am running vCenter 4.1.  The management network has 2 NIC's.  We were testing the redundancy on the management network so we plugged out on of the NIC's.  The management network stayed up.  However when we reconnected the NIC the host went into isolation mode and migrated all the NICs to the remaining host in the cluster.

We had had a second cluster which was connected to a different switch (they were running ESXi 4.1) so we tried to recreate the situation.  The same thing happened.  We set the management NICs to 1000/Full as well as auto negotiate but the result was the same.  I've attached a log file from one of the boxes in case that helps.

I have no idea whats going on any ideas?

Reply
0 Kudos
4 Replies
MauroBonder
VMware Employee
VMware Employee

change isolation response of HA to leave power on.

if you use nic teaming, check configuration of vswitch :

recommended

  • Network      failover detection = Link Status Only
  • Notify      switches = yes
  • Failback      = no


Please, don't forget the awarding points for "helpful" and/or "correct" answers. 

Mauro Bonder - Moderator

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
Reply
0 Kudos
peter79
Enthusiast
Enthusiast

We have done the following tests
Test 1
We had configured vNIC 4 and 5 as active/active (in that order).  When we disconnected vNIC 4.  When we reconnected it isolation mode was triggered and the host vMotioned the test VM on the host.  HA was also enabled.
Test 2
We had configured vNIC 5 and 4 as active/active (in that order).  When we disconnected vNIC 5.  When we reconnected it isolation mode was not triggered and the test VM remained on the host.  HA was also enabled.
Test 3
We removed vNIC 4 from the vSwitch and and replaced it with vNIC 6.  We had configured vNIC 5 and 6 as active/active (in that order).  When we disconnected vNIC 5.  When we reconnected it isolation mode was triggered and the host vMotioned the test VM on the host.  HA was also enabled.
Test 4
We removed vNIC 4 from the vSwitch and and replaced it with vNIC 6.  We had configured vNIC 6 and 5 as active/active (in that order).  When we disconnected vNIC 6.  When we reconnected it isolation mode was not triggered and the test VM remained on the host.  HA was also enabled.
Test 5
We repeated test 1 with HA disabled and isolation mode was not triggered and the test VM remained on the host.
Conclusion
It appears to be a HA has an issue when the vNIC with the lowest ordering is reconnected.  Has anyone seen this type of behavior before? 
Reply
0 Kudos
MauroBonder
VMware Employee
VMware Employee

Did have any difference of configuration in vminc4 in your switch core about VLAN ?!


Please, don't forget the awarding points for "helpful" and/or "correct" answers. 

Mauro Bonder - Moderator

*Please, don't forget the awarding points for "helpful" and/or "correct" answers. *Por favor, não esqueça de atribuir os pontos se a resposta foi útil ou resolveu o problema.* Thank you/Obrigado
Reply
0 Kudos
peter79
Enthusiast
Enthusiast

The configuration of the physical switch ports are identical.  Also the problem isnt vNIC 4.  As you can see from the tests When I used vNIC's 5 and 6 I was had the same problem with vNIC 5.  It seems as if which ever vNIC has the lowest order causes the problem!

Reply
0 Kudos