Hello Everybody,
I have 8 server cluster (esxi 5.1) which I manage by vcenter 5.1. If I click on any esxi host in vcenter and go to Tasks & Events Tab, I can see the following 2 alerts continously occuring every 5 minutes.
1.
Hi,
I think one of your vmnic is down. Have you configured your uplinks as active-active on your vswitch.
Hi,
All the vmnics are up and running. I have 2 nics for Mgmt, vmotion, storage (NFS) and vm port group and both nics in each port group (vswitch) are active-active.
There are no drops on the physical switch either.
Still alerts in vcenter show network connectivity lost and uplink redundancy lost.
AG
Hi,
Since you have active-active setup, can you check with ether-channel or link aggregation configuration from physical switch end.
Also can you post your configuration tab's screenshot for vswitch. And what policy are you using for load balancing for the vSwitch.
Please take a look at http://communities.vmware.com/thread/249314
Hello Nivedan,
There is neither ether channel nor link aggregation /LACP enabled from physical switch side. The policy that i am using is the default one: Load based on virtual port id for all my VSS and VDS.
Thanks,
AG
are you using any CNA adapters, or flexnics ?
what is your hardware infra? give those details and also refer the recent thread http://communities.vmware.com/message/2203049#2203049
and check the firmware/driver compatibility
Hi AG,
Can you confirm if both the vmnic (which you described as active-active) is connected to a same physical switch or to different physical switches.
Please check advanced performance tab for network and check if any packet drops happened in past day.
If connected to same physical switch, you should be using ether channel for active-active setup.
If connected to different physical switches, you should be using active-standby configuration (in vswitch) unless your network is not configured for Primary/Secondary or Active/Standby configuration. Kindly consult with your network admin reg the same.
Also if you are using active-active, the best load balancing would be to choose route based on ip hash.
Hi Nivedan,
Thank You. This indeed is a helpful. I will check and get back. In the mean time i would like to tell you that If I push 1 of the nics down as unused, i get the alert that "Network functionality has been restored" . After that I make my unused nic back to active and I get "Network uplink redundancy restored".
This might be related to what you are saying..
Thanks
AG
Hi Gkeerthy,
I am using Emulex Corporation NC551i Dual Port FlexFabric 10Gb Adapter on HP BL685c Blade servers. I did the firmware and NIC driver upgrade but I am still getting this error.
Hi Nivedan,
Both NICs are connected to same physical switch.
Advance Performance for Network shows "Received Packet Drop" : Max=11, Min=0, Avg=.061.
As mentioned, our 2 management ports are connected to same phy switch. same applies to 2 vmotion ports and 2 storage ports. But we do not have etherchannel, LACP or link aggregation enabled on physical switch.
Thanks,
Asrar
Hi Nivedan,
As I mentioned, I am using HP BL685c Blade servers and I am getting these alerts in the exact order even after the Firmware and NIC driver upgrade.
Hi AG,
Were you able to find the resolution to this issue?
TIA,
Amir
Good day to you,
We are having the same issue on a server that is 5.1 all the rest are 4.x or 5.0 and they are fine.
it appears that the server is running fine with no interruptions in service only the pesky event every 5 minutes to the second.
Were you able to resolve the issue?
Hi Amir/ Brian,
I did manage to get rid of this error by upgrading the firmware, drivers on the servers (especially NICs). I also did the BIOS (ROM upgrade, etc and now I don't see this error.
Might be helpful to you as well, especially if you have HP blade hardware, which is known to have this issue.
Thanks -AG
Hi everybody,
I had the same problem as well. For every of my 10 servers I updated to 5.1 with latest patchlevel (fist to 102189 and at last to 1065491). And it made no difference if the server was connected via Etherchannel (8 of my servers) or just with just 1 cable. By the way - server are all the same: HP Proliant DL380G7 - patched with latest firmware.
It was always the same after the update: there were the messages every, I think, 5 minutes, in the events in vSphere:
"Alarm 'Network connectivity lost' on <SERVERNAME> triggered an action" followed by "Alarm 'Network connectivity lost': an SNMPtrap for entity <SERVERNAME> was sent".
First I checked all my network configuration - did not find anything.
After the first 4 updates I contacted VMWare support and they had a look on our servers and did not find anything. But one of the VMWare technicians restarted the "VMware VirtualCenter Server" service on the vCenter-Server and then the messages disappeared...
I repeated the procedure with the next updates - every time the messages appeared I restarted the service and the messages were not coming again.
So far for my experiences - perhaps this helps you a little bit.
Cheers!
Hello ALDC,
thanks for sharing your answer.
After rebooting the vCenter Server the message didn't appeare any more.
Hi All,
I was getting the same issue on just one of the host in the cluster. All nics in active state, and none down, no errors on switch..
So then I just disabled the alarm on the vcenter, waited a minute and reenabled it back
..
It works fine after that ...
cheers!
Hi,
Just wanted to add, had the same issue with it happening on only one host, disabled and re-enabled the alert fixed the issue for me too.
Animesh, thanks for the tip
wmware says This issue occurs if the number of active NICs is less than three. For Beacon Probing to be effective, you must have at least three active NICs. To resolve this issue, verify the network failover detection settings and ensure that Beacon Probing is used only when there are at least three active NICs configured for the vSwitch. but I'm not sure why it is?
Beaconing is most useful with three or more uplinks in a team because ESXi/ESX can detect failures of a single uplink. When there are only two NICs in service and one of them loses connectivity, it is unclear which NIC needs to be taken out of service because both do not receive beacons and as a result all packets sent to both uplinks. Using at least three NICs in such a team allows for n-2 failures where n is the number of NICs in the team before reaching an ambiguous situation. These uplink NICs should be in an active/active or active/standby configuration because the NICs in an Unused state do not participate in the beacon probing process.
Hope this helps : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100557...
Same issue using vCenter/ESXi 5.1. Restarting the vCenter server fixed it.
Thanks