VMware Networking Community
romuald_geo
Contributor
Contributor

NSX 6.3.3 Controllers Disconnected

Hello,

I have NSX manager version 6.3.3 Build 6276725 deployed with 3 controllers.In Networking & Security > Installation >Management all the controllers status is Disconnected. According to KB2151719 it's known bug. I followed the procedures as in the article, however when run API call of the step 3 I'm getting status code 200 but the body of the reply is the following:

# of controllers: 3

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.223 [controller-9], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.222 [controller-8], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.221 [controller-7], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

Patch script completed successfully.

And all the controllers remain in disconnected status. Any ideas what could be causing the issue? What credentials are used when NSX Manager tries to connect to controllers and where can it be verified?

Thanks

Tags (2)
10 Replies
rajeevsrikant
Expert
Expert

Kindly refer to the below article. Is it the same issue ?

https://vswitchzero.com/2017/09/27/controller-disconnect-and-api-bug-in-nsx-6-3-3/

Reply
0 Kudos
romuald_geo
Contributor
Contributor

Thanks for your time rajeevsrikant.

It's the same issue described in KB2151719 and I followed the procedures as in the resolution, restarting the API servers on the controllers as well.But to no avail, they are still in the Disconnected state. 

Reply
0 Kudos
rajeevsrikant
Expert
Expert

VMware Knowledge Base

Kindly go through this KB, in this tehy have mentioned about the below step. Could you please try this ?

Note: As a part of Step 3, the script will set a temporary password on the Controller, log in to the root shell and change the password for the user account back to the original password set during initial Controller deployment. If any or all of the Controllers are re-deployed, repeat the preceding steps again.

Reply
0 Kudos
romuald_geo
Contributor
Contributor

Scripts in KB51144 are the same as in previously mentioned KBs. When I run the script in step 2 I get 500 Internal Server Error. The very script I ran for the first time returned status code 200. When I issue API call from step 3 I get the same result 

# of controllers: 3

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.223 [controller-9], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.222 [controller-8], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

------------------------------------------------------------------

Fix controller: {Controller: 10.60.1.221 [controller-7], apiUser: admin }

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

  live prompt: <Password: >, SEND reply: <??>

ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures

script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures

Patch script completed successfully.

Reply
0 Kudos
romuald_geo
Contributor
Contributor

In my opinion the problem here is that NSX Manager is providing incorrect credentials when establishing SSH connection with controllers. I wonder if it's possible to configure which credentials are used or in general, what credentials are used for such communication.

Thanks 

Reply
0 Kudos
romuald_geo
Contributor
Contributor

Hi All,

Any ideas ?

Reply
0 Kudos
mdac
Enthusiast
Enthusiast

Hi romuaid_geo,

I'd say your assessment is probably correct. The current controller 'admin' account password may not match what NSX manager thinks it should be. Did you happen to change the password from an earlier login attempt when prompted? If so, VMware GSS may be able to assist in setting it back to it's previous value to match again.

If that proves problematic and you have already upgraded NSX Manager to one of the 're-released' 6.3.3 or 6.3.4 builds (or 6.3.5 and later) it may be easiest to just delete all three controller nodes and then re-deploy them. Keep in mind that if you don't want to upgrade the entire environment, upgrading manager to 6.3.3 build 7087283 won't require the upgrade of any other components aside from the control cluster). With their current state, a force-delete of all three controllers before re-deployment may be necessary. If you do proceed with this, make sure you do it during a maintenance window and be careful to ensure the appliances are deleted as part of the process. If not, they'll need to be manually removed. A force sync of routing and VXLAN services of all NSX prepared clusters would be a good idea once finished as well.

Either way, I'd recommend opening an SR with VMware's NSX support team. They can help to debug the API call and try to get you back on track. If the API fix can't be used, they can also provide more guidance on the re-deployment process I outlined above.

I hope this helps.

Regards,

Mike

My blog: https://vswitchzero.com Follow me on Twitter: @vswitchzero
Reply
0 Kudos
romuald_geo
Contributor
Contributor

Hi vswitchzero,

Thank you for your answer. I've upgraded the NSX Manager to 6.3.3 Build 7087283 version as you suggested. When I login to NSX Manager via SSH and issue the show controller list all command, all three controllers are listed with state UNKNOWN.However in Vsphere Web Client--> Networking & Security --> Installation  NSX Controller nodes list is empty, hence I'm not able to upgrade them to the same version. If I try to deploy new controller I get the error Controller controller-12 creation failed - there is no active controller node for join.

Could anyone please advise. Thanks

Reply
0 Kudos
mdac
Enthusiast
Enthusiast

Hi romuald_geo,

Sorry for the very slow reply on this. Were you able to get it sorted out? If not, let me know. Based on your description, it'll likely be necessary to delete all three controllers and re-deploy them now that you are at build 7087283.

Regards,

Mike

My blog: https://vswitchzero.com Follow me on Twitter: @vswitchzero
Reply
0 Kudos
Beingnsxpaddy
Enthusiast
Enthusiast

I agree with the solution and correct idea is to force delete the controllers and deploy new ones, do it in the maintenance window.

NSX Command Line Quick Reference

Troubleshoot NSX Controller cluster status, roles and connectivity

Worth reading these before execution .

Regards Pradhuman VCIX-NV, VCAP-NV, vExpert, VCP2X-DCVNV If my Answer resolved your query don't forget to mark it as "Correct Answer".