NSX-T Intelligence Appliance | Error SSL Handshake...

muneebali67 · ‎04-21-2020

Greetings,

I have been attempting to deploy the NSX Intelligence Appliance under NSX-T Manager. Each deployment stays at 75%.

on the NSX-IA console, I am getting the SSL handshake failed with NSX-T Manager Cluster

Even though I restarted the kafka service but no positive results:

Please suggest, what could be the missing piece in this deployment or overcome the SSL handshake failed error?

mauricioamorim · ‎04-22-2020

What NSX-T version are you running?

Have you made any change to NSX-T manager certificates?

muneebali67 · ‎04-22-2020

Hi The NSX-T is of 3.0.0 version.

I think I have passed that error, it seemed that I was redeploying a new instance of NSX-IA with same FQDN and IP and NSX-T manager had accepted the certificates of old previous instance of NSX-IA. The workaround was to redeploy with new FQDN and new IP address for NSX - IA.

From the below output in the admin CLI access to NSX IA

get log-file syslog | find pace-monitor

I am getting this response:

Based on the https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.5/administration/GUID-FEEA2270-580F-47C9-B471-...

I cross checked all the services, the services are in STABLE status:

Based on the root CLI access with cat /var/log/pace/health-monitor.log

Finished health monitor task at Wed Apr 22 19:32:19 UTC 2020.

Start health monitor task at Wed Apr 22 19:45:02 UTC 2020.

NSX-Intelligence Status: {

"_schema": "IntelligenceApplianceHealthProperties",

"_self": {

"href": "/node/intelligence/appliance-health",

"rel": "self"

},

"appliance_health": {

"reason": "",

"status": "STABLE",

"sub_system_status": {

"app_services": {

"reason": "",

"services": [

{

"health": "STABLE",

"reason": "",

"service_name": "anomaly-detection"

},

{

"health": "STABLE",

"reason": "",

"service_name": "continuous-monitoring"

},

{

"health": "STABLE",

"reason": "",

"service_name": "pace-server"

},

{

"health": "STABLE",

"reason": "",

"service_name": "nsx-config"

},

{

"health": "STABLE",

"reason": "",

"service_name": "proxy"

},

{

"health": "STABLE",

"reason": "",

"service_name": "configure-zookeeper"

},

{

"health": "STABLE",

"reason": "",

"service_name": "configure-druid"

},

{

"health": "STABLE",

"reason": "",

"service_name": "pace-monitor.timer"

},

{

"health": "STABLE",

"reason": "",

"service_name": "processing"

},

{

"health": "STABLE",

"reason": "",

"service_name": "spark-job-scheduler"

},

{

"health": "STABLE",

"reason": "",

"service_name": "configure-hadoop-hdfs"

},

{

"health": "STABLE",

"reason": "",

"service_name": "pre-hadoop-hdfs"

}

],

"status": "STABLE"

},

"base_infra_services": {

"reason": "",

"services": [

{

"druid_health": {

"broker": {

"health": "STABLE",

"reason": ""

},

"coordinator": {

"health": "STABLE",

"reason": ""

},

"historical": {

"health": "STABLE",

"reason": ""

},

"middlemanager": {

"health": "STABLE",

"reason": ""

},

"middlemanager_correlatedflow": {

"health": "STABLE",

"reason": ""

},

"overlord": {

"health": "STABLE",

"reason": ""

}

},

"service_name": "druid"

},

{

"health": "STABLE",

"reason": "",

"service_name": "kafka"

},

{

"health": "STABLE",

"reason": "",

"service_name": "postgres"

},

{

"health": "STABLE",

"reason": "",

"service_name": "spark"

},

{

"health": "STABLE",

"reason": "",

"service_name": "zookeeper"

},

{

"health": "STABLE",

"reason": "",

"service_name": "hadoop-hdfs"

}

],

"status": "STABLE"

},

"metadata_services": {

"reason": "",

"services": [

{

"health": "STABLE",

"reason": "",

"service_name": "nsx-config-sync"

}

],

"status": "STABLE"

}

Fetching NSX-Intelligence information from NSX manager.

NSX-Intelligence health DEGRADED. Return code 403 is not HTTP OK.

Finished health monitor task at Wed Apr 22 19:45:04 UTC 2020.

root@nsxtm-ia:/opt/vmware/pace/monitor#

cosmin_vmw · ‎05-11-2020

i believe this is a known issue please take a look at VMware Knowledge Base

If TRUE and the virtual machine is powered on when the snapshot is taken, VMware Tools are used to quiesce the file system in the virtual machine. name: quiesce type: boolean export-name: quiesce out-binding: bind: - description: The snapshot created name: snapshot type: 'VC:VirtualMachineSnapshot' export-name: snapshot description: ' ' position: 'y': 60 x: 420 name: item1 out-name: item0 type: link linked-workflow-id: BD80808080808080808080808080808053C180800122528313869552e41805bb1 comparator: 0 - display-name: VM Has Snapshot? script: value: |2- var snapshots = System.getModule("com.vmware.library.vc.vm.snapshot").getAllSnapshotsOfVM(vm) ; if (snapshots == null || snapshots.length == 0) { // No snapshots found; proceed with creating a new one return true; } else { // Found at least one snapshot; end the workflow return false; } encoded: false in-binding: bind: - name: vm type: 'VC:VirtualMachine' export-name: vm out-binding: {} description: Custom decision based on a custom script. position: 'y': 50 x: 260 name: item3 out-name: item1 alt-out-name: item8 type: custom-condition comparator: 0 - in-binding: {} position: 'y': 130 x: 300 name: item8 throw-bind-name: errMachineHasSnapshot type: end end-mode: '1' comparator: 0 presentation: {} root-name: item3 object-name: 'workflow:name=generic' id: c6366b68-0cd1-4829-9562-ad694398d949 version: 0.2.1 api-version: 6.0.0 allowed-operations: vfe restartMode: 1 resumeFromFailedMode: 0 editor-version: '2.0'

FranckF1 · ‎06-19-2020

I am running into the same issue on NSX-T 3.0 and Intelligence Appliance 1.1, I am not using CA signed Cert.

it does seem to match "Issue 2543655 - SSL handshake failure might occur between a transport node and a Kafka Broker in NSX Intelligence." mentioned in the release notes (VMware NSX Intelligence 1.1.0 Release Notes) unfortunately the workaround (restart service kafka) did not work for me.

I have tried re-installing while changing the name / ip of the appliance as suggested in this thread. This did not change the outcome and I can not get past the 75% on the install.

Any suggestion will be appreciated!

cosmin_vmw · ‎06-19-2020

Contact VMware Technical Support and reference this KB article (76583) requesting the FTP link to download the WAR files (proton and policy)

If TRUE and the virtual machine is powered on when the snapshot is taken, VMware Tools are used to quiesce the file system in the virtual machine. name: quiesce type: boolean export-name: quiesce out-binding: bind: - description: The snapshot created name: snapshot type: 'VC:VirtualMachineSnapshot' export-name: snapshot description: ' ' position: 'y': 60 x: 420 name: item1 out-name: item0 type: link linked-workflow-id: BD80808080808080808080808080808053C180800122528313869552e41805bb1 comparator: 0 - display-name: VM Has Snapshot? script: value: |2- var snapshots = System.getModule("com.vmware.library.vc.vm.snapshot").getAllSnapshotsOfVM(vm) ; if (snapshots == null || snapshots.length == 0) { // No snapshots found; proceed with creating a new one return true; } else { // Found at least one snapshot; end the workflow return false; } encoded: false in-binding: bind: - name: vm type: 'VC:VirtualMachine' export-name: vm out-binding: {} description: Custom decision based on a custom script. position: 'y': 50 x: 260 name: item3 out-name: item1 alt-out-name: item8 type: custom-condition comparator: 0 - in-binding: {} position: 'y': 130 x: 300 name: item8 throw-bind-name: errMachineHasSnapshot type: end end-mode: '1' comparator: 0 presentation: {} root-name: item3 object-name: 'workflow:name=generic' id: c6366b68-0cd1-4829-9562-ad694398d949 version: 0.2.1 api-version: 6.0.0 allowed-operations: vfe restartMode: 1 resumeFromFailedMode: 0 editor-version: '2.0'

FranckF1 · ‎06-19-2020

Thanks for this. Will do, but that KB (76583) specifically states :

"This issue is resolved in VMware NSX-T Data Center 3.0.0 with NSX Intelligence 1.1.0, available at VMware Downloads."

I will try it nonetheless and see what happens.

Best,

All

NSX-T Intelligence Appliance | Error SSL Handshake Failed