Solved: Troubleshooting ESX 4.1 "Network connectivity lost...

wealvescabral · ‎03-22-2013

Hi Folks,

I have made a change in one vSS in the network failover detection from "Link status only" to "Beacon probing".

Now a have a alarm active with red status: "Network Connectivity lost", but the message in events of host doesnt show wich vmnic of my active uplinks has the problem.

Its there any way to find out wich physical interface have this configuration issue?

Thanks

------

Wellington Cabral

Wellington Cabral | Mark it as helpful or correct if my suggestion is useful.

grasshopper · ‎03-22-2013

Also, if you would like to check for linkstate changes in your hosts you can ssh to the desired host (or via iLO console, etc.) and simply grep for "link" (provides most output) or "linkstate" (provides minimal output, only showing up/down events).

Command Syntax (depending on ESXi host version you may need to modify slightly)

ESXi 4.x

cat /var/log/messages | grep -i link

-or-

ESXi 5.x

cat /var/log/vmkernel.log | grep -i link

Here's some example output captured while doing network maintenance:

~ # tail -fv /var/log/messages | grep -i link
Jul 26 16:34:43 shell[372172]: tail -fv /var/log/messages | grep -i link
Jul 26 16:35:52 vmkernel: 1:00:26:20.001 cpu13:4462)<3>qlge 0000:08:00.1: ql_link_down: vmnic5: Link Down.
Jul 26 16:35:52 vobd: Jul 26 16:35:52.540: 87980025771us: [vob.net.vmnic.linkstate.down] vmnic vmnic5 linkstate down.
Jul 26 16:35:52 vmkernel: 1:00:26:20.057 cpu15:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:35:52 vmkernel: 1:00:26:20.057 cpu15:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:35:53 vobd: Jul 26 16:35:53.542: 87981034666us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic5 linkstate is down.
Jul 26 16:38:14 vmkernel: 1:00:28:41.743 cpu8:4455)<3>qlge 0000:08:00.1: ql_link_up: vmnic5: Link Up.
Jul 26 16:38:14 vobd: Jul 26 16:38:14.432: 88121917666us: [vob.net.vmnic.linkstate.up] vmnic vmnic5 linkstate up.
Jul 26 16:38:15 vobd: Jul 26 16:38:15.434: 88122926678us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic5 linkstate is up.
Jul 26 16:39:57 vmkernel: 1:00:30:25.243 cpu0:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:39:58 vmkernel: 1:00:30:25.541 cpu2:4443)<3>qlge 0000:11:00.1: ql_link_down: vmnic7: Link Down.
Jul 26 16:39:58 vobd: Jul 26 16:39:58.224: 88225709691us: [vob.net.vmnic.linkstate.down] vmnic vmnic7 linkstate down.
Jul 26 16:39:59 vobd: Jul 26 16:39:59.226: 88226718868us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic7 linkstate is down.
Jul 26 16:40:58 vmkernel: 1:00:31:26.091 cpu13:4459)<3>qlge 0000:11:00.1: ql_link_up: vmnic7: Link Up.
Jul 26 16:40:58 vobd: Jul 26 16:40:58.890: 88286375827us: [vob.net.vmnic.linkstate.up] vmnic vmnic7 linkstate up.
Jul 26 16:40:59 vobd: Jul 26 16:40:59.892: 88287384684us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic7 linkstate is up.

Note: The above example uses the tail command in follow (-f) mode to observe the desired forensics in real-time (i.e. during your change). Alternatively, to review historic linkstate changes that have already happened, then use the cat command first noted at the top of the post.

View solution in original post

a_p_ · ‎03-22-2013

Can you confirm it's not just an Alarm which has been triggered once when you changed the setting and just needs to be acknowledged/reset?

An easy way to check network activity is to use esxtop from the host's command line. After starting it press "n" for network monitoring.

André

grasshopper · ‎03-22-2013

Also, if you would like to check for linkstate changes in your hosts you can ssh to the desired host (or via iLO console, etc.) and simply grep for "link" (provides most output) or "linkstate" (provides minimal output, only showing up/down events).

Command Syntax (depending on ESXi host version you may need to modify slightly)

ESXi 4.x

cat /var/log/messages | grep -i link

-or-

ESXi 5.x

cat /var/log/vmkernel.log | grep -i link

Here's some example output captured while doing network maintenance:

~ # tail -fv /var/log/messages | grep -i link
Jul 26 16:34:43 shell[372172]: tail -fv /var/log/messages | grep -i link
Jul 26 16:35:52 vmkernel: 1:00:26:20.001 cpu13:4462)<3>qlge 0000:08:00.1: ql_link_down: vmnic5: Link Down.
Jul 26 16:35:52 vobd: Jul 26 16:35:52.540: 87980025771us: [vob.net.vmnic.linkstate.down] vmnic vmnic5 linkstate down.
Jul 26 16:35:52 vmkernel: 1:00:26:20.057 cpu15:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:35:52 vmkernel: 1:00:26:20.057 cpu15:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:35:53 vobd: Jul 26 16:35:53.542: 87981034666us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic5 linkstate is down.
Jul 26 16:38:14 vmkernel: 1:00:28:41.743 cpu8:4455)<3>qlge 0000:08:00.1: ql_link_up: vmnic5: Link Up.
Jul 26 16:38:14 vobd: Jul 26 16:38:14.432: 88121917666us: [vob.net.vmnic.linkstate.up] vmnic vmnic5 linkstate up.
Jul 26 16:38:15 vobd: Jul 26 16:38:15.434: 88122926678us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic5 linkstate is up.
Jul 26 16:39:57 vmkernel: 1:00:30:25.243 cpu0:4947)sfqosq: del pc member - q_ulink is already NULL, return
Jul 26 16:39:58 vmkernel: 1:00:30:25.541 cpu2:4443)<3>qlge 0000:11:00.1: ql_link_down: vmnic7: Link Down.
Jul 26 16:39:58 vobd: Jul 26 16:39:58.224: 88225709691us: [vob.net.vmnic.linkstate.down] vmnic vmnic7 linkstate down.
Jul 26 16:39:59 vobd: Jul 26 16:39:59.226: 88226718868us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic7 linkstate is down.
Jul 26 16:40:58 vmkernel: 1:00:31:26.091 cpu13:4459)<3>qlge 0000:11:00.1: ql_link_up: vmnic7: Link Up.
Jul 26 16:40:58 vobd: Jul 26 16:40:58.890: 88286375827us: [vob.net.vmnic.linkstate.up] vmnic vmnic7 linkstate up.
Jul 26 16:40:59 vobd: Jul 26 16:40:59.892: 88287384684us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic7 linkstate is up.

Note: The above example uses the tail command in follow (-f) mode to observe the desired forensics in real-time (i.e. during your change). Alternatively, to review historic linkstate changes that have already happened, then use the cat command first noted at the top of the post.

wealvescabral · ‎03-23-2013

Thanks Folks,

You have clarified all my doubts about this issue.

Apparently, the red status message only appears on the vSS that has a VMKernel Port Group during reconfiguring the FailOver detection method. I have another vSS that has only VM Port Group and this vSwtich doesnt show this type of alarm during reconfiguring process.

While I was digging into the logs I could see the uplinks has come back to "up" state and I reseted the alarm, everything has returned to normal status.

Wellington Cabral | Mark it as helpful or correct if my suggestion is useful.

All

Troubleshooting ESX 4.1 "Network connectivity lost"