gaganvmware
Enthusiast
Enthusiast

ESXi 5 critical Issue

Jump to solution

I have 20 standalone ESXi 5 Servers in the vmware farm . For some reason we did not put them in cluster & SAN is not attached. all 20 esxi 5 servers have only their local storage . I am facing many issue with them .. such as

1. network connectivity lost

2. host disconnected

3. host unresponsive

4. Host not responding

when these alerts triggered then  ESXi server becomes  Gray and unresponsive . we are not able to connect the host for a while .... but after sometime.. ESXi server becomes green again and start working like nothing happened .

Could you please help me to understand the reason of that.

0 Kudos
1 Solution

Accepted Solutions
memaad
Commander
Commander

Hi,

This will be helpful based on above data

VMware KB: Network Connectivity Lost Due to Physical Link Down

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201987...

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.

View solution in original post

0 Kudos
25 Replies
a_p_
Leadership
Leadership

I'd start with checking the host's Events and vmkernel logs. If the network comes back without doing anything, you may also want to check whether someone uses the same IP addresses for other systems!?

André

0 Kudos
gaganvmware
Enthusiast
Enthusiast

i checked events & logs ..esxi servers come back by themselves from unresponsive state . i saw a lot of disk latency issue as vms on esxi trying to access the local disk and it generate disk latency  and disk getting low .. we are trying to add more disk .. do you think by increasing disk can solve the issue .. or Disk latency leads to network latency .. what steps can help me to fix it

0 Kudos
memaad
Commander
Commander

Hi,

First I would like to know what was recent changes on ESXI host , any driver upgrade ? Or any physical networking changes ?

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos
gaganvmware
Enthusiast
Enthusiast

we upgraded esxi from 4 to version 5 . i checked with network team .. no physical switch upgrade.

0 Kudos
memaad
Commander
Commander

Hi,

I need to know what is  physical nic that you are using.

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos
a_p_
Leadership
Leadership

IIRC there I read about driver issues for specific network adapters. What type/model of hosts and network adapters do you use? Maybe it can be solved by simply updating the drivers!?

André

0 Kudos
gaganvmware
Enthusiast
Enthusiast

i can see disk latency messages and after disk latency i see next host disconnected / unresponsive and after a while esxi comes back like nothing happened ...

we are using x3650 m3 IBM  & BROADCOM netXtreme 1000Base-t driver for nics

0 Kudos
memaad
Commander
Commander

HI,

First run this command

esxcfg-nics -l

then run this command on ESXi host  ethtool -i vmnic0 

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos
gaganvmware
Enthusiast
Enthusiast

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

vmnic0  0000:0b:00.00 bnx2        Up   1000Mbps  Full   5c:f3:fc:e5:8b:cc 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic1  0000:0b:00.01 bnx2        Up   1000Mbps  Full   5c:f3:fc:e5:8b:ce 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic2  0000:10:00.00 bnx2        Down 0Mbps     Half   5c:f3:fc:6a:2c:38 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic3  0000:10:00.01 bnx2        Down 0Mbps     Half   5c:f3:fc:6a:2c:3a 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vusb0   Pseudo        cdc_ether   Up   10Mbps    Half   5e:f3:fc:dd:8b:cf 1500   Unknown Unknown

~ # ethtool -i vmnic0

driver: bnx2

version: 2.0.15g.v50.11-7vmw

firmware-version: bc 6.2.0 NCSI 2.0.11

bus-info: 0000:0b:00.0

~ #

0 Kudos
a_p_
Leadership
Leadership

What I would do is to check whether a firmware upgrade is available for the networ adapters and install the latest drivers on one of the hosts to see whether this solves the issue. Drivers can be downloaded from https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-BROADCOM-BNX2X-17654V501&productId=...

André

memaad
Commander
Commander

Hi,

As a.p. , mention use below link to get the latest driver, I see latest drivers are available for  nic that you are using.

VMware Compatibility Guide: I/O Device Search

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
gaganvmware
Enthusiast
Enthusiast

do you think driver upgrade can fix this issue ? what about disk latency issue .. do they lead to network latency that cause host network issue / disconnected ?

0 Kudos
a_p_
Leadership
Leadership

Hard to say without all the details (log files, firmware versions, ...). Maybe it's a good idea to check whether IBM has firmware updates available for all hardware components (BIOS, network, RAID controller, ...) and apply those.

André

0 Kudos
memaad
Commander
Commander

Hi,

Can you get me esxtop output from esxi host using this  command

Using SSH to ESXi host

esxtop

then hit key 'd' .   note down DAVG value , KAVG, and GAVG.  , if this value is high, then storage vendor need to investigate.

Also you can refer this KB VMware KB: Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos
jdptechnc
Expert
Expert

It sounds to me like your VM's could be generating enough disk activity to cause the hypervisor console, which is using the same disks, to become unresponsive.  It is possible that the kernel just can't process the management quickly enough due to the latency on the disks, resulting in vCenter losing the management connection temporarily.

Does this seem to follow certain windows, like backup processing for example?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
0 Kudos
gaganvmware
Enthusiast
Enthusiast

$5E6E98EA2532D674.jpg

output of

esxtop

  DAVG value , KAVG, and GAVG. 

0 Kudos
memaad
Commander
Commander

Hi,

I dont see any disk   latency on your ESXI host.

Same was you can see if there is any packt drop. Once you type esxtop, hit key 'n' , you will network data. Check  if there is any packet drop when the host become un-responsive.

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos
gaganvmware
Enthusiast
Enthusiast

$67F2EB168278E24_09.jpg

packet drop / press n output ..

0 Kudos
memaad
Commander
Commander

Hi,

This also look good. So question comes, do you see any pattern of disconnection, like during any specific time or specific host disconnection. ?

Regards

Mohammed

Mohammed Emaad |VCP 3, 4,5 |VCP -NV 6 | VCP-DT 51 | vCAP4-DCA | VCAP5DCA | | Mark it as helpful or correct if my suggestion is useful.
0 Kudos