Booting one host lets other hosts loss network red...

AndyBauer · ‎10-22-2013

Hi all,

I have a host cluster with about 20 esxi hosts. HA and DRS is enabled.

If I set one host to maintenance mode and then reboot the host I get a network redundancy loss alert on three of my other hosts. The hosts are up und running with no failure and the vms work fine but I wonder why this happens. Any suggestions?

abhilashhb · ‎10-22-2013

This aleart usually pops up if you have just one uplink on your management network. To avoid this error you can configure the management network with two network cards and team them. Either by keeping both active or one active and another standby.

If you already have the teaming and still getting the error you can refer this KB Article http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100470...

Abhilash B
LinkedIn : https://www.linkedin.com/in/abhilashhb/

AndyBauer · ‎10-22-2013

Hi,

thanks for reply.

all Hosts are connected with 2 Nics to the Management Portgroup on a vDSwitch. The Teaming and Failover setting is active active.

Are the servers heartbeating each other and this is the cause? I thougt that for cluster heartbeating the datastores are used?

I only would like to ensure that no misconfiguration in my environment exists and I get in future trouble. But sounds that this is normal behavior.

tomtom901 · ‎10-22-2013

Hi,

Do the physical switches report any loss of link? I don't think this is by design, because in all my enviroments, I don't see this issue. Are both nics added as active?

AndyBauer · ‎10-22-2013

All hosts are added with two nics to the portgroup, yes they are both placed under active uplinks in the Management Portgroup. I dont understand why one host affects the others when rebooting.

tomtom901 · ‎10-22-2013

Do the physical switches report loss of link? Is it always the same machine reporting this?

AndyBauer · ‎10-22-2013

I have to simulate this again while the networking guys have a eye on this. I think its always the same what happens: Reboot vm033 and vm031, vm035 and vm036 are getting the alert. On reboot of other hosts nothing happens.

I will try to reproduce and let you know.

tomtom901 · ‎10-22-2013

Thanks, that would help. You could also monitor the vmkernel.log (tail -f /var/log/vmkernel.log via SSH) once you do this.

abhilashhb · ‎10-22-2013

Did you see the KB Article posted?

Abhilash B
LinkedIn : https://www.linkedin.com/in/abhilashhb/

AndyBauer · ‎10-22-2013

Hi,

digging deeper I found an interesting fact. Its not the management nic thats create the alert, its the iscsi one!!

ISCSI is also configured with 2 physical nics on an vds but with 2 port-groups iscsi_A and iscsi_B. The two uplinks have each a own IP on the same subnet. On ISCSI_A the dvUplink1 is active and dvUplink2 not used, on the ISCSI_B vice versa.

tomtom901 · ‎10-22-2013

So you have multipathing configured. I'm still curious to see if the networking guys see a physical link go down.

AndyBauer · ‎10-22-2013

Ok i have now the feedback from netwoking, the links went down at this moment I rebooted the host.

tomtom901 · ‎10-22-2013

The links on an OTHER host went down once you rebooted a different host?

AndyBauer · ‎10-22-2013

Thats what confusing me.

I have vm020, vm021, vm022..... to vm041

If I reboot vm035, I get the alert on vm029, vm031 and vm033.

Mysterious.

tomtom901 · ‎10-22-2013

Check this together with the network team. That might be your issue:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100380...

AndyBauer · ‎10-22-2013

Ok, thank you very much, I will check with networking devision and mark your answer as the right one if succeeded.

All

Booting one host lets other hosts loss network redundancy