VMware Cloud Community
Elmars
Contributor
Contributor
Jump to solution

ESXi 5.5 build 2456374 fails to reconnect to vcenter after reboot

Hi

Have been banging my head against this one for the past few weeks. A host which has been running fine at an earlier 5.5 build fails to reconnect to vcenter after reboot once upgraded to build 2456374. This happens also to newly built hosts. When a reconnect is initiated manually, it proceeds up to 80% (processing data from host) then dies with a "general system error: internal error". I have tried deleting the vpxa account, resetting the network config via the console, cutting nics from the vswitch, removing extra vkernel interfaces. Resetting the network config via console allows me to reconfigure the net and reattach the host to vcenter and complete host configuration. However as soon as I reboot the host, it no longer reconnects to vcenter and I am back where I started.

Unfortunately, this is not a 100% event. If I rebuild a host enough times, one of the rebuilds will finally works correctly.

any ideas on how to resolve this issue would be appreciated.

Elmars

1 Solution

Accepted Solutions
Elmars
Contributor
Contributor
Jump to solution

ISSUE REOLVED!!!

Ok. so that was the easy part. Opened a case with vmware to identify the root cause for this. After much digging, fretting, hair pulling, we found a clue in the hostd.log on the ESXi host.

2015-04-02T04:49:05.481Z [23381B70 error 'Hostsvc.NetworkProvider' opID=HB-host-185064@917-1d029f4d-d2 user=vpxuser] An error occurred while fetching stack instance configuration: Inconsistent value of ccalgo for Netstack instance: defaultTcpipStack.

In my network performance testing I recently did, I was able to significantly improve iSCSI networking performance by switching from the NewReno network congestion control algorithm to Cubic. For each of these hosts, as part of the setup procedure I now set the congestion control algorithm to Cubic via the web interface. Turns out there is a bit of a bug with this that VMware had thought fixed. Isn't.

To doubly confirm you are seeing this issue, login to the host (ssh or remote cli) and run the following command:

esxcli network ip netstack list

You should get a message stating that there is an inconsistency in the netstack configuration.

To fix, run the following command:

esxcli network ip netstack set -c cubic -N defaultTcpipStack

the netstack list command from above should now show detailed information, and will now be able to connect to vcenter.

As I understand, a KB article will be published on this in the near future.

View solution in original post

6 Replies
DavoudTeimouri
Virtuoso
Virtuoso
Jump to solution

Hi,

What is your vCenter server version?

BR

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
Elmars
Contributor
Contributor
Jump to solution

Davoud

This is happening with both vcenter 5.5u2 and 5.5u2b. Hosts are both Intel and AMD based, and both HP and IBM hosts with respective customized install images. All hosts are in HA/DRS clusters, have multiple vmkernel interfaces on standard vswitches. Some hosts are still 1G based, others are 10G. All are configured for vFRC. Some hosts configured with software iSCSI, others with NFS.

Elmars 

Sent via the Samsung Galaxy Note® 3

0 Kudos
DavoudTeimouri
Virtuoso
Virtuoso
Jump to solution

Hi Elmars,

I guess, it's related to your host self-signed certificate, follow the below article to regenerate the certificate.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102050...

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
Elmars
Contributor
Contributor
Jump to solution

Davoud-

Thanks for the suggestion. I tried it out but did not do the trick... yes even rebooted the host after deleting the certs. I did check the cert on the host before deleting it and it was good until 2024. I have also had this happen on newly built hosts, that have been built with the HP or IBM customized ESXi images. The installs work great until the host is updated with update manager. i.e. a freshly built host using the HP or IBM image - hours old.

Elmars

0 Kudos
Elmars
Contributor
Contributor
Jump to solution

Update-

Spent half a day on the phone with vmware yesterday on this issue. We re-tried most of the steps already performed above, including confirming the hostd agent is running on the host, replacing the host certificate, attempting to connect the host to two different vcenters and probably something more... none of which generated the desired result. I have forwarded the vmware and host log bundles for review and they will get back to me today or tomorrow. Meanwhile, one of the suggestions was to upgrade vcenter to the latest version and see if this fixes the problem. Will be testing this today in my lab environment, where I have yet another host exhibiting the same issues.

Elmars

0 Kudos
Elmars
Contributor
Contributor
Jump to solution

ISSUE REOLVED!!!

Ok. so that was the easy part. Opened a case with vmware to identify the root cause for this. After much digging, fretting, hair pulling, we found a clue in the hostd.log on the ESXi host.

2015-04-02T04:49:05.481Z [23381B70 error 'Hostsvc.NetworkProvider' opID=HB-host-185064@917-1d029f4d-d2 user=vpxuser] An error occurred while fetching stack instance configuration: Inconsistent value of ccalgo for Netstack instance: defaultTcpipStack.

In my network performance testing I recently did, I was able to significantly improve iSCSI networking performance by switching from the NewReno network congestion control algorithm to Cubic. For each of these hosts, as part of the setup procedure I now set the congestion control algorithm to Cubic via the web interface. Turns out there is a bit of a bug with this that VMware had thought fixed. Isn't.

To doubly confirm you are seeing this issue, login to the host (ssh or remote cli) and run the following command:

esxcli network ip netstack list

You should get a message stating that there is an inconsistency in the netstack configuration.

To fix, run the following command:

esxcli network ip netstack set -c cubic -N defaultTcpipStack

the netstack list command from above should now show detailed information, and will now be able to connect to vcenter.

As I understand, a KB article will be published on this in the near future.