asrarguna
Enthusiast
Enthusiast

Alarm 'Network uplink redundancy lost': an SNMP trap for entity "ESXI Server was sent info 2/28/2013 10:27:10 AM

Hello Everybody,

I have 8 server cluster (esxi 5.1) which I manage by vcenter 5.1. If I click on any esxi host in vcenter and go to Tasks & Events Tab, I can see the following 2 alerts continously occuring every 5 minutes.

1.

Alarm 'Network connectivity lost' on
"esxi server" triggered an action
info 2/28/2013 10:32:10 AM
2. Right after this first alert, the 2nd alert says:
Alarm 'Network uplink redundancy lost': an SNMP
trap for entity "esxi server" was sent
info 2/28/2013 10:32:10 AM
This repeats after every 5 minutes on all the servers in the cluster. I have 2 more servers that are not a part of the cluster. I checked those and they too have the same thing going on.
First I though it was an issue with the servers in the cluster and I did right click on the server and reconfigure HA. That didn't work. But now I noticed that it is happening on the servers that are not a part of the cluster.
Any suggestion would be highly appreciated.
Kind Regards,
AG
19 Replies
nive1103
Enthusiast
Enthusiast

Hi,

I think one of your vmnic is down. Have you configured your uplinks as active-active on your vswitch.

Regards, Nivedan
0 Kudos
asrarguna
Enthusiast
Enthusiast

Hi,

All the vmnics are up and running. I have 2 nics for Mgmt, vmotion, storage (NFS) and vm port group and both nics in each port group (vswitch) are active-active.

There are no drops on the physical switch either.

Still alerts in vcenter show network connectivity lost and uplink redundancy lost.

AG

0 Kudos
nive1103
Enthusiast
Enthusiast

Hi,

Since you have active-active setup, can you check with ether-channel or link aggregation configuration from physical switch end.

Also can you post your configuration tab's screenshot for vswitch. And what policy are you using for load balancing for the vSwitch.

Please take a look at http://communities.vmware.com/thread/249314

Regards, Nivedan
0 Kudos
asrarguna
Enthusiast
Enthusiast

Hello Nivedan,

There is neither ether channel nor link aggregation /LACP enabled from physical switch side. The policy that i am using is the default one: Load based on virtual port id for all my VSS and VDS.

Thanks,

AG

0 Kudos
Gkeerthy
Expert
Expert

are you  using any CNA adapters, or flexnics ?

what is your hardware infra? give those details and also refer the recent thread http://communities.vmware.com/message/2203049#2203049

and check the firmware/driver compatibility

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)
0 Kudos
nive1103
Enthusiast
Enthusiast

Hi AG,

Can you confirm if both the vmnic (which you described as active-active) is connected to a same physical switch or to different physical switches.

Please check advanced performance tab for network and check if any packet drops happened in past day.

If connected to same physical switch, you should be using ether channel for active-active setup.

If connected to different physical switches, you should be using active-standby configuration (in vswitch) unless your network is not configured for Primary/Secondary or Active/Standby configuration. Kindly consult with your network admin reg the same.

Also if you are using active-active, the best load balancing would be to choose route based on ip hash.

Regards, Nivedan
0 Kudos
asrarguna
Enthusiast
Enthusiast

Hi Nivedan,

Thank You. This indeed is a helpful. I will check and get back. In the mean time i would like to tell you that If I push 1 of the nics down as unused, i get the alert that "Network functionality has been restored" . After that I make my unused nic back to active and I get "Network uplink redundancy restored".

This might be related to what you are saying..

Thanks

AG

0 Kudos
asrarguna
Enthusiast
Enthusiast

Hi Gkeerthy,

I am using Emulex Corporation NC551i Dual Port FlexFabric 10Gb Adapter on HP BL685c Blade servers. I did the firmware and NIC driver upgrade but I am still getting this error.

Hi Nivedan,

Both NICs are connected to same physical switch.

Advance Performance for Network shows "Received Packet Drop" : Max=11, Min=0, Avg=.061.

As mentioned, our 2 management ports are connected to same phy switch. same applies to 2 vmotion ports and 2 storage ports. But we do not have etherchannel, LACP or link aggregation enabled on physical switch.

Thanks,

Asrar

0 Kudos
asrarguna
Enthusiast
Enthusiast

Hi Nivedan,

As I mentioned, I am using HP BL685c Blade servers and I am getting these alerts in the exact order even after the Firmware and NIC driver upgrade.

Host is not responding error (This causes Red Alarm on the host in vCenter).
3/10/2013 10:23:45 AM
VMs show disconnected (greyed out) and so does the esxi hosts.
Alarm 'Host connection and power state' on esxi-Host changed from Green to Red
info 3/10/2013 10:23:45 AM
Alarm 'Host memory usage' on esxi-Host changed from Green to Gray
info 3/10/2013 10:23:45 AM
Alarm 'Host cpu usage' on esxi-Host changed from Green to Gray
info 3/10/2013 10:23:45 AM
Alarm 'Host service console swap rates' on esxi-Host changed from Green to Gray
info 3/10/2013 10:23:45 AM
Alarm 'Network connectivity lost': an SNMP trap for entity esxi-Host was sent
info 3/10/2013 10:24:06 AM
Alarm 'Network connectivity lost' on esxi-Host triggered an action
info 3/10/2013 10:24:06 AM
Alarm 'Network uplink redundancy lost': an SNMP trap for entity esxi-Host was sent
info 3/10/2013 10:24:06 AM
Alarm 'Network uplink redundancy lost' on esxi-Host triggered an action
info 3/10/2013 10:24:06 AM
The hosts will be back to normal after 3 or 4 seconds and VMs and host that were greyed out will be back to normal.
Thanks,
AG
0 Kudos
asafayan
Contributor
Contributor

Hi AG,

Were you able to find the resolution to this issue?

TIA,

Amir

0 Kudos
BrianDVS
Contributor
Contributor

Good day to you,

We are having the same issue on a server that is 5.1 all the rest are 4.x or 5.0 and they are fine.

it appears that the server is running fine with no interruptions in service only the pesky event every 5 minutes to the second.

Were you able to resolve the issue?

0 Kudos
asrarguna
Enthusiast
Enthusiast

Hi Amir/ Brian,

I did manage to get rid of this error by upgrading the firmware, drivers on the servers (especially NICs). I also did the BIOS (ROM upgrade, etc and now I don't see this error.

Might be helpful to you as well, especially if you have HP blade hardware, which is known to have this issue.

Thanks -AG

0 Kudos
ALDC
Contributor
Contributor

Hi everybody,

I had the same problem as well. For every of my 10 servers I updated to 5.1 with latest patchlevel (fist to 102189 and at last to 1065491). And it made no difference if the server was connected via Etherchannel (8 of my servers) or just with just 1 cable. By the way - server are all the same: HP Proliant DL380G7 - patched with latest firmware.

It was always the same after the update: there were the messages every, I think, 5 minutes, in the events in vSphere:

"Alarm 'Network connectivity lost' on <SERVERNAME> triggered an action" followed by "Alarm 'Network connectivity lost': an SNMPtrap for entity <SERVERNAME> was sent".

First I checked all my network configuration - did not find anything.

After the first 4 updates I contacted VMWare support and they had a look on our servers and did not find anything. But one of the VMWare technicians restarted the "VMware VirtualCenter Server" service on the vCenter-Server and then the messages disappeared...

I repeated the procedure with the next updates - every time the messages appeared I restarted the service and the messages were not coming again.

So far for my experiences - perhaps this helps you a little bit.

Cheers!

ChristianFenebe
Contributor
Contributor

Hello ALDC,

thanks for sharing your answer.

After rebooting the vCenter Server the message didn't appeare any more.

0 Kudos
animesh41
Enthusiast
Enthusiast

Hi All,

I was getting the same issue on just one of the host in the cluster. All nics in active state, and none down, no errors on switch..

So then I just disabled the alarm on the vcenter, waited a minute and reenabled it back

..

It works fine after that ...

cheers!

VCIX-NV | VCAP 3x | AWS-SAA | TOGAF | vExpert
0 Kudos
DaneTruscott
Contributor
Contributor

Hi,

Just wanted to add, had the same issue with it happening on only one host, disabled and re-enabled the alert fixed the issue for me too.

Animesh, thanks for the tip

0 Kudos

wmware says This issue occurs if the number of active NICs is less than three. For Beacon Probing to be effective, you must have at least three active NICs. To resolve this issue, verify the network failover detection settings and ensure that Beacon Probing is used only when there are at least three active NICs configured for the vSwitch. but I'm not sure why it is?

0 Kudos
animesh41
Enthusiast
Enthusiast

Beaconing is most useful with three or more uplinks in a team because ESXi/ESX can detect failures of a single uplink. When there are only two NICs in service and one of them loses connectivity, it is unclear which NIC needs to be taken out of service because both do not receive beacons and as a result all packets sent to both uplinks. Using at least three NICs in such a team allows for n-2 failures where n is the number of NICs in the team before reaching an ambiguous situation. These uplink NICs should be in an active/active or active/standby configuration because the NICs in an Unused state do not participate in the beacon probing process.


Hope this helps : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100557...

VCIX-NV | VCAP 3x | AWS-SAA | TOGAF | vExpert
0 Kudos
SupportS2L
Contributor
Contributor

Same issue using vCenter/ESXi 5.1. Restarting the vCenter server fixed it.

Thanks

0 Kudos