VMware Cloud Community
gaganvmware
Enthusiast
Enthusiast
Jump to solution

ESXi 5 critical Issue

I have 20 standalone ESXi 5 Servers in the vmware farm . For some reason we did not put them in cluster & SAN is not attached. all 20 esxi 5 servers have only their local storage . I am facing many issue with them .. such as

1. network connectivity lost

2. host disconnected

3. host unresponsive

4. Host not responding

when these alerts triggered then  ESXi server becomes  Gray and unresponsive . we are not able to connect the host for a while .... but after sometime.. ESXi server becomes green again and start working like nothing happened .

Could you please help me to understand the reason of that.

Reply
0 Kudos
1 Solution

Accepted Solutions
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

This will be helpful based on above data

VMware KB: Network Connectivity Lost Due to Physical Link Down

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201987...

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.

View solution in original post

Reply
0 Kudos
25 Replies
a_p_
Leadership
Leadership
Jump to solution

I'd start with checking the host's Events and vmkernel logs. If the network comes back without doing anything, you may also want to check whether someone uses the same IP addresses for other systems!?

André

Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

i checked events & logs ..esxi servers come back by themselves from unresponsive state . i saw a lot of disk latency issue as vms on esxi trying to access the local disk and it generate disk latency  and disk getting low .. we are trying to add more disk .. do you think by increasing disk can solve the issue .. or Disk latency leads to network latency .. what steps can help me to fix it

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

First I would like to know what was recent changes on ESXI host , any driver upgrade ? Or any physical networking changes ?

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

we upgraded esxi from 4 to version 5 . i checked with network team .. no physical switch upgrade.

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

I need to know what is  physical nic that you are using.

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

IIRC there I read about driver issues for specific network adapters. What type/model of hosts and network adapters do you use? Maybe it can be solved by simply updating the drivers!?

André

Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

i can see disk latency messages and after disk latency i see next host disconnected / unresponsive and after a while esxi comes back like nothing happened ...

we are using x3650 m3 IBM  & BROADCOM netXtreme 1000Base-t driver for nics

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

HI,

First run this command

esxcfg-nics -l

then run this command on ESXi host  ethtool -i vmnic0 

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

vmnic0  0000:0b:00.00 bnx2        Up   1000Mbps  Full   5c:f3:fc:e5:8b:cc 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic1  0000:0b:00.01 bnx2        Up   1000Mbps  Full   5c:f3:fc:e5:8b:ce 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic2  0000:10:00.00 bnx2        Down 0Mbps     Half   5c:f3:fc:6a:2c:38 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vmnic3  0000:10:00.01 bnx2        Down 0Mbps     Half   5c:f3:fc:6a:2c:3a 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

vusb0   Pseudo        cdc_ether   Up   10Mbps    Half   5e:f3:fc:dd:8b:cf 1500   Unknown Unknown

~ # ethtool -i vmnic0

driver: bnx2

version: 2.0.15g.v50.11-7vmw

firmware-version: bc 6.2.0 NCSI 2.0.11

bus-info: 0000:0b:00.0

~ #

Reply
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

What I would do is to check whether a firmware upgrade is available for the networ adapters and install the latest drivers on one of the hosts to see whether this solves the issue. Drivers can be downloaded from https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-BROADCOM-BNX2X-17654V501&productId=...

André

memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

As a.p. , mention use below link to get the latest driver, I see latest drivers are available for  nic that you are using.

VMware Compatibility Guide: I/O Device Search

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
gaganvmware
Enthusiast
Enthusiast
Jump to solution

do you think driver upgrade can fix this issue ? what about disk latency issue .. do they lead to network latency that cause host network issue / disconnected ?

Reply
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Hard to say without all the details (log files, firmware versions, ...). Maybe it's a good idea to check whether IBM has firmware updates available for all hardware components (BIOS, network, RAID controller, ...) and apply those.

André

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

Can you get me esxtop output from esxi host using this  command

Using SSH to ESXi host

esxtop

then hit key 'd' .   note down DAVG value , KAVG, and GAVG.  , if this value is high, then storage vendor need to investigate.

Also you can refer this KB VMware KB: Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
jdptechnc
Expert
Expert
Jump to solution

It sounds to me like your VM's could be generating enough disk activity to cause the hypervisor console, which is using the same disks, to become unresponsive.  It is possible that the kernel just can't process the management quickly enough due to the latency on the disks, resulting in vCenter losing the management connection temporarily.

Does this seem to follow certain windows, like backup processing for example?

Please consider marking as "helpful", if you find this post useful. Thanks!... IT Guy since 12/2000... Virtual since 10/2006... VCAP-DCA #2222
Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

$5E6E98EA2532D674.jpg

output of

esxtop

  DAVG value , KAVG, and GAVG. 

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

I dont see any disk   latency on your ESXI host.

Same was you can see if there is any packt drop. Once you type esxtop, hit key 'n' , you will network data. Check  if there is any packet drop when the host become un-responsive.

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
gaganvmware
Enthusiast
Enthusiast
Jump to solution

$67F2EB168278E24_09.jpg

packet drop / press n output ..

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

This also look good. So question comes, do you see any pattern of disconnection, like during any specific time or specific host disconnection. ?

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos