I have 20 standalone ESXi 5 Servers in the vmware farm . For some reason we did not put them in cluster & SAN is not attached. all 20 esxi 5 servers have only their local storage . I am facing many issue with them .. such as
1. network connectivity lost
2. host disconnected
3. host unresponsive
4. Host not responding
when these alerts triggered then ESXi server becomes Gray and unresponsive . we are not able to connect the host for a while .... but after sometime.. ESXi server becomes green again and start working like nothing happened .
Could you please help me to understand the reason of that.
Hi,
This will be helpful based on above data
VMware KB: Network Connectivity Lost Due to Physical Link Down
Regards
Mohammed
I'd start with checking the host's Events and vmkernel logs. If the network comes back without doing anything, you may also want to check whether someone uses the same IP addresses for other systems!?
André
i checked events & logs ..esxi servers come back by themselves from unresponsive state . i saw a lot of disk latency issue as vms on esxi trying to access the local disk and it generate disk latency and disk getting low .. we are trying to add more disk .. do you think by increasing disk can solve the issue .. or Disk latency leads to network latency .. what steps can help me to fix it
Hi,
First I would like to know what was recent changes on ESXI host , any driver upgrade ? Or any physical networking changes ?
Regards
Mohammed
we upgraded esxi from 4 to version 5 . i checked with network team .. no physical switch upgrade.
Hi,
I need to know what is physical nic that you are using.
Regards
Mohammed
IIRC there I read about driver issues for specific network adapters. What type/model of hosts and network adapters do you use? Maybe it can be solved by simply updating the drivers!?
André
i can see disk latency messages and after disk latency i see next host disconnected / unresponsive and after a while esxi comes back like nothing happened ...
we are using x3650 m3 IBM & BROADCOM netXtreme 1000Base-t driver for nics
HI,
First run this command
esxcfg-nics -l
then run this command on ESXi host ethtool -i vmnic0
Regards
Mohammed
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:0b:00.00 bnx2 Up 1000Mbps Full 5c:f3:fc:e5:8b:cc 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic1 0000:0b:00.01 bnx2 Up 1000Mbps Full 5c:f3:fc:e5:8b:ce 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic2 0000:10:00.00 bnx2 Down 0Mbps Half 5c:f3:fc:6a:2c:38 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic3 0000:10:00.01 bnx2 Down 0Mbps Half 5c:f3:fc:6a:2c:3a 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vusb0 Pseudo cdc_ether Up 10Mbps Half 5e:f3:fc:dd:8b:cf 1500 Unknown Unknown
~ # ethtool -i vmnic0
driver: bnx2
version: 2.0.15g.v50.11-7vmw
firmware-version: bc 6.2.0 NCSI 2.0.11
bus-info: 0000:0b:00.0
~ #
What I would do is to check whether a firmware upgrade is available for the networ adapters and install the latest drivers on one of the hosts to see whether this solves the issue. Drivers can be downloaded from https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-BROADCOM-BNX2X-17654V501&productId=...
André
Hi,
As a.p. , mention use below link to get the latest driver, I see latest drivers are available for nic that you are using.
VMware Compatibility Guide: I/O Device Search
Regards
Mohammed
do you think driver upgrade can fix this issue ? what about disk latency issue .. do they lead to network latency that cause host network issue / disconnected ?
Hard to say without all the details (log files, firmware versions, ...). Maybe it's a good idea to check whether IBM has firmware updates available for all hardware components (BIOS, network, RAID controller, ...) and apply those.
André
Hi,
Can you get me esxtop output from esxi host using this command
Using SSH to ESXi host
esxtop
then hit key 'd' . note down DAVG value , KAVG, and GAVG. , if this value is high, then storage vendor need to investigate.
Also you can refer this KB VMware KB: Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)
Regards
Mohammed
It sounds to me like your VM's could be generating enough disk activity to cause the hypervisor console, which is using the same disks, to become unresponsive. It is possible that the kernel just can't process the management quickly enough due to the latency on the disks, resulting in vCenter losing the management connection temporarily.
Does this seem to follow certain windows, like backup processing for example?
output of
esxtop
DAVG value , KAVG, and GAVG.
Hi,
I dont see any disk latency on your ESXI host.
Same was you can see if there is any packt drop. Once you type esxtop, hit key 'n' , you will network data. Check if there is any packet drop when the host become un-responsive.
Regards
Mohammed
packet drop / press n output ..
Hi,
This also look good. So question comes, do you see any pattern of disconnection, like during any specific time or specific host disconnection. ?
Regards
Mohammed