Hi.
After installing vSphere vCenter and ESXi 5.1 (with latest patches), we are starting to see hosts disconnecting randomly from vCenter.
We have different server models but so far, we have only seeing the problem on our HP Proliant BL465c Gen8 servers.
To solve this, we right click the host in vCenter and click reconnect – and sometimes we need to use the Shell to restart the management agent.
Anyone familiar with this issue?
Are you running vCenter on Windows Server 2008 R2? If so, check to make sure that after the upgrade the Windows firewall profile or rules have not changed.
I have three generation of HP Blade in one C7000 enclosure utilizing VC FLEX10.
BL460c G6 (intel)
BL465c G7 (Amd)
BL465c Gen8 (Amd)
I to experience random disconnects from vCenter and I see the same pattern as you people “Receive packets dropped” but NOT on our G6 blades (and no disconnects)!
I tried to chase HBA as suggested by #jquest21 but I don’t the “ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40” in the logs so I’m not eager to just disable Interrupt mapping.
Driver/firmware:
BL465c Gen8 (Amd)
NIC:
Emulex Corporation HP FlexFabric 10Gb 2-port 554FLB Adapter
driver: be2net
version: 4.4.231.0
firmware-version: 4.2.401.605
HBA:
Emulex Corporation LPe12000 8Gb Fibre Channel Host Adapter
vmhba5 lpfc820 link-up fc.2000009c0224964e:1000009c0224964e (0:5:0.0) Emulex Corporation LPe12000 8Gb Fibre Channel Host Adapter
Version: Version 0:8.2.3.1-127vmw, Build: 799733, Interface: 9.2 Built on: Aug 1 2012
BL465c G7 (Amd)
Emulex Corporation NC551i Dual Port FlexFabric 10Gb Adapter
driver: be2net
version: 4.1.255.11
firmware-version: 4.1.450.16
bus-info: 0000:04:00.0
HBA:
QLogic Corp ISP2532-based 8Gb Fibre Channel to PCI Express HBA
Version: Version 911.k1.1-26vmw, Build: 472560, Interface: 9.2 Built on: Feb 8 2012
BL460c G6 (intel)
NIC:
Broadcom Corporation NetXtreme II BCM57711E/NC532i 10 Gigabit Ethernet
driver: bnx2x
version: 1.74.17.v50.1
firmware-version: bc 6.2.25 phy baa0.105
bus-info: 0000:02:00.0
HBA
QLogic Corp ISP2432-based 4Gb Fibre Channel to PCI Express HBA
Version: Version 934.5.6.0-1vmw, Build: 472560, Interface: 9.2 Built on: Sep 21 2012
jquest21 wrote:
Not sure if any of this is 100% network related. There is an issue with HBA's disconnecting, and with 5.1 if hosts lost connection to Storage will also be in disconnected state.
Need to enable SSH on ESX host, and make sure SSH ports open on Firewall of ESX host. Then can connect via SSH and PUTTY.
Enter command:
esxcli system settings kernel list -o iovDisableIR, will show status…if it is False will need to run next command to set to true:
esxcli system settings kernel set --setting=iovDisableIR -v TRUE
Then need to reboot to enable setting on Host. Could re-run first command after reboot to verify settings.
Please see VMware KB 1030265
I think this is the ticket. However, when you have a cluster of ESXi hosts running hundreds of production servers and they can't stay connected to vCenter long enough to migrate them, its hard to reboot the host isn't it? ![]()
in our case when we had that issue initially...one would see the HBA's drop connectivity on the SAN switch. It would not always be both ports that would disconnect.
Some times just one....then the second one would disconnect later.
Not all hosts would happen simultaneously.
We do have a number of ESXi hosts and 100's of servers running...we didn't have issue of not being able to vMotion off the host.
However, lately we have been having random disconnects as well....like once every few weeks a host would disconnect. We had the HBA settings above, so was not the issue this time.
With VMware support they said there was hotfix released last week to address.
Patch ESXi510-201307401-BG
Have applied that this week...so time will tell if that ultimately resolves.
We normally do monthly patching of ESX environment, on second week of the month....so would have applied this in next couple of weeks...but accelerated the patching due to the issue.
Hmm... yeah, mine are disconnecting constantly. They stay connected for about 1 minute and then disconnect. It makes it hard to vmotion. We're going to have shut down all the customer VM's to be able to reboot. ![]()
And I think its storage, because the clusters share datastores, except for a few hosts that use a different set of datastores, and those one's stay connected. This is affecting Dell on 4.1, and HP and Cisco UCS on 5.1. The one's staying connected are on the same UCS fabric too as the others, just are not mapped to the common datastores.
The "1 minute" issue reminds me of a DNS resolution issue. Can you confirm that the FQDNs of all hosts as well as vCenter Server can properly be resolved by all hosts?
André
you have access to the fiber switches?
on ours it would show no link when issue occurred.
So if truly a storage related issue with what was described in that article, you would see no link on your fiber switches.
If not the case...think issue may be elsewhere.
also to reiterate....at time only one switch would have no link...however host connection didn't drop until both switches were showing no link....as even with one down...still had a path. So be sure to look at both switches when host disconnects.
Fiber switches seem to be ok. My VNX is also showing the hosts connected and the ones that are staying connected are on the same SAN, just some different LUNs. It is weird.
Found the problem. One of my engineers is implementing Splunk/VMware app and was installing the OVA's. He made a change to the firewall between the esx host network and the vCenter and accidentally blocked UDP 902. When we opened it again, great things began to happen.
yes, same thing happened with my on my C7000 cnslosure blades, i too got mixed Blades servers like you:
HP BL 465c G7 and 465c G8 (are all AMD) it happens intermittently and disconnecting randomly as well. while my HP BL 460c G7 and 460c G8 all Intel processor never have this problem before.
Any kind of help and suggestion or update would be greatly appreciated.
Thanks
We had this issue too with c7000's HP Flex 10 and ESXi 5.1, it turned out to be cached mac addressing in ESXi.
As we had physically moved some blades around and assigned new virtual connect profiles, but ESXi retained the old MAC address that virtual connect had assigned it.
Once the old profile is deleted the MAC address goes back into the free pool to be handed out, ending up with MAC address conflicts.
The following KB was what helped me out;
Hope this helps
Ian.
Ok, I have found the resolution for my csae here:
My ESXi host version is ESXi 5.1.0 build-1065491
This issue occurs because the vCenter Server agent on the host (vpxa) fails to send heartbeats to the vCenter Server.
This is a known issue affecting ESXi/ESX 4.x and ESXi 5.x, the resolution is to update the ESXi host to Build 1157734 ( ESXi 5.1 Patch 2 )
Reference - VMware KB: ESXi and ESX hosts randomly disconnect from VMware vCenter Server
Patch details - VMware KB: VMware ESXi 5.1, Patch Release ESXi510-201307001
Hope this help you guys.
Hi All,
Also make sure that you use the HP custom VSphere images.
Has anyone updated to a newer version/update of VSphere and does the issue still occur?
Why not upgrade the hosts to 5.5 or update 2 of 5.1 to check the state then?
Kind Regards
This issue reappeared for me when using the HP Custom ISO, the problem I had is with the driver version ‘10.2.453.0-2263645’ as is discussed on http://ict-freak.nl/2014/10/01/hps-september-vmware-driver-bundle-and-issues-with-emulex-cnas/
The latest driver from VMware for the Emulex card I am using is ‘be2net-4.9.234.8-2365770.zip’ dated 2014-12-17 works great.
However Update manager does not detect this driver as an upgrade, so I rolled back the driver and installed manually as per KB: 2079279.
Before carrying out this make sure that the firmware on the NIC is up to date as per;
http://vibsdepot.hp.com/hpq/recipes/HP-VMware-Recipe.pdf
The hardware that I have had the issue on;
HP ProLiant BL465c G7 - Emulex Corporation HP NC552m Dual Port Flex-10 10Gbe BL-c Adapter
HP ProLiant BL460c Gen8 - Emulex Corporation HP FlexFabric 10Gb 2-port 554FLB Adapter
Hope this helps.
Sorry for replying to an old Post, but Windows Firewall caused all this for me for days I couldn't find the reason.
As my Host are sitting in different VLAN and when I installed VCentre it only opened ports for DOMAIN Profile not private or public.
Thanks for your helpful tip.
yes, I'm having the same problem as you too guys, all in a sudden the ESXi 5.1 Update 1 host is disconnected from the VCenter and I cannot right click to bring it back connected to the VCenter, so I had to perform hard reset from the iLO
I'm running HP BL 465c G7 and G8 and here's the detailed version of y firmware and the hardware model:
NIC: Emulex Corporation NC551i
HBA: ISP2532
~ # ethtool -i vmnic0
driver: be2net
version: 4.2.327.0
firmware-version: 4.2.401.6
bus-info: 0000:04:00.0
may I know what was the fix for this issue here ?
Hi AlbertWT,
VMware have released a KB specifically for this KB:2044681
Essentially you need to upgrade the firmware and the driver, I have a BL 465c G7 too with NC551i using the following configuration which is working fine now;
driver: be2net
version: 4.9.234.8
firmware-version: 4.9.416.2
If you can get SSH connected to your blade by other means, such as using a Mezz Card? Copying the .scexe firmware file locally and running is much easier, if not HP Sum will do the job.
http://h20564.www2.hp.com/hpsc/swd/public/readIndex?sp4ts.oid=5033634
https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI51-EMULEX-BE2NET-492348&productId=285
@lmcbride: Yes, HP got the advisory page: HP Support document - HP Support Center but somehow it is stated like the following:
Update the Emulex adapter firmware to version 4.2.401.2215 (or later) for the HP ProLiant BL465 G7 server and the HP ProLiant BL685 G7 server:
but when I checked the firmware version on the ESXi host, it is giving me:
~ # ethtool -i vmnic0
driver: be2net
version: 4.2.327.0
firmware-version: 4.2.401.6
bus-info: 0000:04:00.0
So should I apply the firmware version suggested by HP knowing that my version above is greater than what the advisories suggested ?
Have you tried delete VPXA user it automatically disconnect server from Vcenter and then try add again that recreate VPXA that should fix issue. I got this issue on ESX4.0 I fixed the issue the same way
service vmware-vpxa stop
service mgmt-vmware stop
cat /etc/passwd |grep vpxuser
userdel vpxuser
rpm -qa |grep -iE 'vpx|aam'
rpm -e VMware-vpxa-5.0.0-773848
service mgmt-vmware start
