i would recommend to restart the management agents:
what happens if you rightclick the disconnected server in vcenter and say reconnect? does ke asks you for credentials?
maybe he has lost the certificates by upgrading to vpshere 6.0 and now he thinks that this could be another server.
so you have to reconnect them manuallly
i haven't tried to disconnect and reconnect the host. but i did try to right click and selected connect. this did nothing.
when i have the problem again I'll try to restart the management agent.
Hello at all!
I had the same issue some days ago in two different environments. One standalone free ESXi 6.0 Hypervisor and one in a two-node-cluster managed by vCenter-Server-Appliance 6.0.
I tried to reconnect the host, but i didn't work for me.
At the DCUI is tried to enter my password, but the Host did not respond. Only the reboot did solve my problem. After that everything was fine.
I'm running the ESXi 6.0 on a Fujitsu RX200 S6 and RX 200 S7.
Please let me know if there is a fix for this issue.
I have had something similar on an upgraded test host, the server would randomly disconnect and a reboot resolved it. Eventually the host wouldn't reconnect to vcenter at all.
The fix for me was up uninstall the vpxa agent, restart the host then reconnect to vcenter (as though connecting a new host)
could you please confirm how you uninstall the vpxa agent...
If you're seeing this in your vmkernel.log at the time of the disconnect it could be related to an issue that will one day be described at the below link (it is not live at this time). We see this after a random amount of time and nothing VMware technical support could do except reboot the host helped.
2015-07-19T08:22:35.552Z cpu0:33257)WARNING: LinNet: netdev_watchdog:3678:
NETDEV WATCHDOG: vmnic4: transmit timed out
2015-07-19T08:22:35.552Z cpu0:33257)WARNING: at vmkdrivers/src_92/vmklinux_92/vmware/linux_net.c:3707/netdev_watchdog()(inside vmklinux)
2015-07-19T08:22:35.552Z cpu0:33257)Backtrace for current CPU #0,worldID=33257, rbp=0x430609af4380
2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be10:[0x418029896b4e]vmk_LogBacktraceMessage@vmkernel#nover+0x22 stack: 0x430609af4380, 0
2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be30:[0x418029f1e7b7]email@example.comAPI#9.2+0x27f stack: 0x430609ac3ce
2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bea0:[0x418029f44a5f]firstname.lastname@example.orgAPI#9.2+0xd7 stack: 0x4306
2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bf30:[0x41802984f872]helpFunc@vmkernel#nover+0x4e6 stack: 0x0, 0x430609ac3ce0, 0x27, 0x0,
2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bfd0:[0x418029a1231e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0,
sdnbtech, have you heard or seen any updates on the issue you described? I haven't been able to get an update on the status of a fix from VMware after about a few weeks after confirming VMware engineering is working on a solution. A host downgrade to 5.5 was the only recommendation aside from rebooting the 6.0 hosts each time networking drops.
I seem to be having very similar issues:
2015-08-11T11:14:53.340Z cpu23:33256)WARNING: LinNet: netdev_watchdog:3678: NETDEV WATCHDOG: vmnic4: transmit timed out
2015-08-11T11:14:53.340Z cpu23:33256)<6>ixgbe 0000:41:00.0: vmnic4: Fake Tx hang detected with timeout of 160 seconds
When this happens, both ports on a dual port NIC die at the same time and only a reboot fixes it. I opened an SR with VMware support with reference back to here and the not-yet-exiting KB posted above and will follow up if/when I hear something back on this.
Troubleshooting a non-responsive host without looking at the logs is not really effective, You can open a service request with VMware.
share the log details, Without logs it is hard to find root cause. storage might also be the reason. APD recovery issue still unresolved in 6.0.
What about VMs on host , are they live when host go unresponsive? Even time sync make host disconnected.
Confirmed what sdnbtech stated above. The "transmit timed out" is a known issue. No ETA on a time frame for release yet, not very forthcoming with details. Basically was told to downgrade if this issue is affecting me as there is no workaround. Engineer I spoke to says he sees this at least once a week.
I checked this morning and there are a few options. 1) Apply a debug build of ESXi that will still be affected by the problem but gather more information for the development team, 2) There is a script that has to be run at each boot of each ESXi server that they believe fixes the issue entirely but can cause performance degradation, 3) Downgrade to 5.5 or below.
My case has now been open 60 days regarding this issue. It's very disappointing.