Hello,
I have NSX manager version 6.3.3 Build 6276725 deployed with 3 controllers.In Networking & Security > Installation >Management all the controllers status is Disconnected. According to KB2151719 it's known bug. I followed the procedures as in the article, however when run API call of the step 3 I'm getting status code 200 but the body of the reply is the following:
# of controllers: 3
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.223 [controller-9], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.222 [controller-8], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.221 [controller-7], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
Patch script completed successfully.
And all the controllers remain in disconnected status. Any ideas what could be causing the issue? What credentials are used when NSX Manager tries to connect to controllers and where can it be verified?
Thanks
Kindly refer to the below article. Is it the same issue ?
https://vswitchzero.com/2017/09/27/controller-disconnect-and-api-bug-in-nsx-6-3-3/
Thanks for your time rajeevsrikant.
It's the same issue described in KB2151719 and I followed the procedures as in the resolution, restarting the API servers on the controllers as well.But to no avail, they are still in the Disconnected state.
Kindly go through this KB, in this tehy have mentioned about the below step. Could you please try this ?
Note: As a part of Step 3, the script will set a temporary password on the Controller, log in to the root shell and change the password for the user account back to the original password set during initial Controller deployment. If any or all of the Controllers are re-deployed, repeat the preceding steps again.
Scripts in KB51144 are the same as in previously mentioned KBs. When I run the script in step 2 I get 500 Internal Server Error. The very script I ran for the first time returned status code 200. When I issue API call from step 3 I get the same result
# of controllers: 3
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.223 [controller-9], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.222 [controller-8], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
------------------------------------------------------------------
Fix controller: {Controller: 10.60.1.221 [controller-7], apiUser: admin }
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
live prompt: <Password: >, SEND reply: <??>
ERROR in connect: SSH_MSG_DISCONNECT: 2 Too many authentication failures
script failed: com.jcraft.jsch.JSchException: SSH_MSG_DISCONNECT: 2 Too many authentication failures
Patch script completed successfully.
In my opinion the problem here is that NSX Manager is providing incorrect credentials when establishing SSH connection with controllers. I wonder if it's possible to configure which credentials are used or in general, what credentials are used for such communication.
Thanks
Hi All,
Any ideas ?
Hi romuaid_geo,
I'd say your assessment is probably correct. The current controller 'admin' account password may not match what NSX manager thinks it should be. Did you happen to change the password from an earlier login attempt when prompted? If so, VMware GSS may be able to assist in setting it back to it's previous value to match again.
If that proves problematic and you have already upgraded NSX Manager to one of the 're-released' 6.3.3 or 6.3.4 builds (or 6.3.5 and later) it may be easiest to just delete all three controller nodes and then re-deploy them. Keep in mind that if you don't want to upgrade the entire environment, upgrading manager to 6.3.3 build 7087283 won't require the upgrade of any other components aside from the control cluster). With their current state, a force-delete of all three controllers before re-deployment may be necessary. If you do proceed with this, make sure you do it during a maintenance window and be careful to ensure the appliances are deleted as part of the process. If not, they'll need to be manually removed. A force sync of routing and VXLAN services of all NSX prepared clusters would be a good idea once finished as well.
Either way, I'd recommend opening an SR with VMware's NSX support team. They can help to debug the API call and try to get you back on track. If the API fix can't be used, they can also provide more guidance on the re-deployment process I outlined above.
I hope this helps.
Regards,
Mike
Hi vswitchzero,
Thank you for your answer. I've upgraded the NSX Manager to 6.3.3 Build 7087283 version as you suggested. When I login to NSX Manager via SSH and issue the show controller list all command, all three controllers are listed with state UNKNOWN.However in Vsphere Web Client--> Networking & Security --> Installation NSX Controller nodes list is empty, hence I'm not able to upgrade them to the same version. If I try to deploy new controller I get the error Controller controller-12 creation failed - there is no active controller node for join.
Could anyone please advise. Thanks
Hi romuald_geo,
Sorry for the very slow reply on this. Were you able to get it sorted out? If not, let me know. Based on your description, it'll likely be necessary to delete all three controllers and re-deploy them now that you are at build 7087283.
Regards,
Mike
I agree with the solution and correct idea is to force delete the controllers and deploy new ones, do it in the maintenance window.
NSX Command Line Quick Reference
Troubleshoot NSX Controller cluster status, roles and connectivity
Worth reading these before execution .