VMware Cloud Community
jcreek01
Enthusiast
Enthusiast

esxi maintenance mode communication error

When attempting to enter maintenance mode:

From vCenter:

A general system error occurred: Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.

Command line running:

esxcli system maintenanceMode set --enable true

I receive the error:

A general system error occurred: Broken pipe: The communication pipe/socket is explicitly closed by the remote service.

VMware ESXi, 6.7.0, 10764712

vCenter

And vCenter:

Happening on two different clusters.

0 Kudos
14 Replies
kenbshinn
Enthusiast
Enthusiast

Is the vCenter server and the two clusters you mentioned located in the same Data Center (Physical not in VMWare)? Are there any firewalls or routers in between that could be blocking communication?

0 Kudos
jcreek01
Enthusiast
Enthusiast

Two separate vCenter servers each in the same physical data center with the cluster.

No firewalls between.

This worked fine for months.

Verified with network team that nothing has changed.

Seems to be a service issue on the ESXi servers.

I tried:

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

No change.

Then I tried:

services.sh restart

Then waited a few minutes and then entering maintenance mode from CLI worked.

I have done this successfully on two of the 10 servers so far.

0 Kudos
kenbshinn
Enthusiast
Enthusiast

Have you made any changes to the ESXi hosts before this started like Updated, etc?

I have not seen this issue before, but I wonder if it could be cause by the build you are running.

0 Kudos
jcreek01
Enthusiast
Enthusiast

Updated ESXi and vCenter about a month ago.

0 Kudos
kenbshinn
Enthusiast
Enthusiast

Okay and I assume that this problem seems to have started around that time based on your comments earlier saying this worked for months and this update happening last month.

Have you tried connecting to the host directly using the web console and see if you can enter maintenance mode from there? If that works then it shifts the blame to vCenter being the culprit.

It sounds like it might be an issue with esxi, but I would't rule out vCenter either. While you have the hosts in Maintenance mode, I would try removing them from vCenter and re-adding them and see if that resolves the issue. Doing that should reinstall the agents that vCenter puts in esxi in order to manage it.

0 Kudos
SureshKumarMuth
Commander
Commander

Even the command line gives error here so looks like an issue at the host level not due to vCenter.

Is the issue occurring randomly ? is it reproducible ?

Next time if you face the issue, use localcli instead of esxcli and check the outcome (localcli will bypass the hostd).

1.What image you are using for ESXi (OEM image from hardware vendor or VMware image)?

2. Is the issue happening only on the updated hosts ?

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos
kenbshinn
Enthusiast
Enthusiast

The build listed above says VMware ESXi, 6.7.0, 10764712 and it sounds like it is happening on all hosts across 2 different vCenters.

0 Kudos
jcreek01
Enthusiast
Enthusiast

Connected directly to the web console of one of the hosts which would not go into maintenance mode from vCenter.  Then tried to go into maintenance mode.  Received the following:

Failed to enter maintenance mode: A general system error occurred: Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem, timeout, or service overload.

Then tried esxcli system maintenanceMode set --enable true.  Received the same error.

Then localcli system maintenanceMode set --enable true.  Received the same error.

/etc/init.d/hostd restart  (since you are supposed to after using localcli, right?)

Then esxcli system maintenanceMode set --enable true worked.

So, it does not seem to be a vCenter issue.  Seems to be an ESXi issue.

The systems are Dell R730s.  Originally installed with DellEMC Customized Image ESXi 6.7 A01 (based on ESXi VMKernel Release Build 8169922) then upgraded to 6.7.0 Update 1 (Build 10764712). 

Thanks,

Jeff

0 Kudos
jcreek01
Enthusiast
Enthusiast

Also seeing issues with vSAN health check.  Hosts with connectivity issues.  I am assuming this may be related.  If the management agents are having issues communicating locally then vCenter would have issues communicating to them also.

0 Kudos
SureshKumarMuth
Commander
Commander

So it indicates that the hostd often goes to not responding state and it doesnt come up automatically unless you restart the agent. Since it is a vSAN enabled host, better to raise a case with VMware and ask them to generate a hostd dump and investigate why the agent is hanging , they will help you to find the actual cause. It may be a bug as well.

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos
jcreek01
Enthusiast
Enthusiast

Having trouble opening a support case.  Ordered support for VMware through Dell.  Now Dell says we do not have support for VMware.

0 Kudos
jcreek01
Enthusiast
Enthusiast

Have figured out the Dell/VMware support issue.

Working with Dell and VMware on the issue.

0 Kudos
mtront1
Contributor
Contributor

Can you tell me how you resolved this?  We are having the same issue.

0 Kudos
jcreek01
Enthusiast
Enthusiast

Worked with VMware for the past couple months.  They were not able to find what exactly is causing hostd and vsanmgmtd to crash.  However, it seems that upgrading ESXi (and other prerequisite components) seems to have solved the issue.

I am using Dell servers.

I upgraded to:

Component                           Version               Build

vCenter                                 6.7.0.20000     10244745

Intel X710 NIC Firmware     18.5.17         

NIC Driver                            1.7.1

HBA330 Driver                    17.00.01.00

ESXi                                    6.7.0                  11675023

I would have like to have known exactly what the cause was.  It is probably the ESXi upgrade that fixed it.  But that is just a theory.

Now to upgrade my second cluster.

Jeff

0 Kudos