VMware Cloud Community
AndyBauer
Contributor
Contributor

Booting one host lets other hosts loss network redundancy

Hi all,

I have a host cluster with about 20 esxi hosts. HA and DRS is enabled.

If I set one host to maintenance mode and then reboot the host I get a network redundancy loss alert on three of my other hosts. The hosts are up und running with no failure and the vms work fine but I wonder why this happens. Any suggestions?

0 Kudos
15 Replies
abhilashhb
VMware Employee
VMware Employee

This aleart usually pops up if you have just one uplink on your management network. To avoid this error you can configure the management network with two network cards and team them. Either by keeping both active or one active and another standby.

If you already have the teaming and still getting the error you can refer this KB Article http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100470...

Abhilash B
LinkedIn : https://www.linkedin.com/in/abhilashhb/

0 Kudos
AndyBauer
Contributor
Contributor

Hi,

thanks for reply.

all Hosts are connected with 2 Nics to the Management Portgroup on a vDSwitch. The Teaming and Failover setting is active active.

Are the servers heartbeating each other and this is the cause? I thougt that for cluster heartbeating the datastores are used?

I only would like to ensure that no misconfiguration in my environment exists and I get in future trouble. But sounds that this is normal behavior.

0 Kudos
tomtom901
Commander
Commander

Hi,


Do the physical switches report any loss of link? I don't think this is by design, because in all my enviroments, I don't see this issue. Are both nics added as active?

0 Kudos
AndyBauer
Contributor
Contributor

All hosts are added with two nics to the portgroup, yes they are both placed under active uplinks in the Management Portgroup. I dont understand why one host affects the others when rebooting.

0 Kudos
tomtom901
Commander
Commander

Do the physical switches report loss of link? Is it always the same machine reporting this?

0 Kudos
AndyBauer
Contributor
Contributor

I have to simulate this again while the networking guys have a eye on this. I think its always the same what happens: Reboot vm033 and vm031, vm035 and vm036 are getting the alert. On reboot of other hosts nothing happens.

I will try to reproduce and let you know.

0 Kudos
tomtom901
Commander
Commander

Thanks, that would help. You could also monitor the vmkernel.log (tail -f /var/log/vmkernel.log via SSH) once you do this.

0 Kudos
abhilashhb
VMware Employee
VMware Employee

Did you see the KB Article posted?

Abhilash B
LinkedIn : https://www.linkedin.com/in/abhilashhb/

0 Kudos
AndyBauer
Contributor
Contributor

Hi,

digging deeper I found an interesting fact. Its not the management nic thats create the alert, its the iscsi one!!

ISCSI is also configured with 2 physical nics on an vds but with 2 port-groups iscsi_A and iscsi_B. The two uplinks have each a own IP on the same subnet. On ISCSI_A the dvUplink1 is active and dvUplink2 not used, on the ISCSI_B vice versa.

0 Kudos
tomtom901
Commander
Commander

So you have multipathing configured. I'm still curious to see if the networking guys see a physical link go down.

0 Kudos
AndyBauer
Contributor
Contributor

Ok i have now the feedback from netwoking, the links went down at this moment I rebooted the host.

0 Kudos
tomtom901
Commander
Commander

The links on an OTHER host went down once you rebooted a different host?

0 Kudos
AndyBauer
Contributor
Contributor

Thats what confusing me.

I have vm020, vm021, vm022..... to vm041

If I reboot vm035, I get the alert on vm029, vm031 and vm033.

Mysterious.

0 Kudos
tomtom901
Commander
Commander

Check this together with the network team. That might be your issue:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100380...

0 Kudos
AndyBauer
Contributor
Contributor

Ok, thank you very much, I will check with networking devision and mark your answer as the right one if succeeded.

0 Kudos