VMware Networking Community
Serendipitous
Contributor
Contributor

I found a bug in NSX 3.2 preventing any redeployment of NSX-ALB

If you deploy NSX-ALB through the appliances menu from the system configuration tab of NSX-3.2, wait until it's fully deployed, delete it, then attempt to deploy it again and monitor /var/log/syslog.log you will find at 75% deployment (configuration phase) the following messages

 

2022-01-13T00:25:35.099Z nsx.somesite.local NSX 6565 POLICY [nsx@6876 comp="nsx-manager" level="WARNING" reqId="d8b8c035-0bca-4343-b39b-c7c2b80901d4" subcomp="manager" username="system"] Invalid operation. Entity /infra/sites/default/enforcement-points/alb-endpoint marked for delete
2022-01-13T00:25:35.099Z nsx.somesite.local NSX 6565 - [nsx@6876 comp="nsx-manager" level="WARNING" reqId="d8b8c035-0bca-4343-b39b-c7c2b80901d4" subcomp="manager" username="system"] EnforcementPoint com.vmware.nsx.management.policy.policyframework.service.EnforcementPointManagementServiceImpl.createOrUpdate(EnforcementPoint) failed with class com.vmware.nsx.management.common.exceptions.InvalidArgumentException.
2022-01-13T00:25:35.099Z nsx.somesite.local NSX 6565 POLICY [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] [ALB Policy] EXCEPTION DURING CONFIGURATION : com.vmware.nsx.management.common.exceptions.InvalidArgumentException: An object with the same path=[/infra/sites/default/enforcement-points/alb-endpoint] is marked for deletion. Either use another path or wait for the purge cycle (max 5 minutes) for permanent removal of the object.

 


This instantly fails the configuration.

If you run an API query for 
https://nsx.somesite.local/policy/api/v1/infra/deployment-zones/:deployment-zone-id/enforcement-poin...

You get back

 

{
    "results": [
        {
            "connection_info": {
                "username": "nsxt-alb",
                "tenant": "admin",
                "expires_at": "2022-01-12T04:30:33.365Z",
                "managed_by": "LCM",
                "enforcement_point_address": "10.7.77.100",
                "resource_type": "AviConnectionInfo"
            },
            "auto_enforce": true,
            "resource_type": "EnforcementPoint",
            "id": "alb-endpoint",
            "display_name": "alb-endpoint",
            "path": "/infra/sites/default/enforcement-points/alb-endpoint",
            "relative_path": "alb-endpoint",
            "parent_path": "/infra/sites/default",
            "unique_id": "53dccaa5-478a-40c6-b3aa-b4b19f0c2f2d",
            "realization_id": "53dccaa5-478a-40c6-b3aa-b4b19f0c2f2d",
            "marked_for_delete": true,
            "overridden": false,
            "_create_time": 1641940233616,
            "_create_user": "system",
            "_last_modified_time": 1642044635878,
            "_last_modified_user": "admin",
            "_system_owned": false,
            "_protection": "NOT_PROTECTED",
            "_revision": 4
        },

 

 

I have tried everything I can think of to try and get this thing to either be removed from /infra/sites/default/enforcement-points, nothing seems to work.

Labels (4)
Tags (3)
Reply
0 Kudos
10 Replies
VitaliyChirkov
Contributor
Contributor

Thanks. i have same problem... 

Reply
0 Kudos
grimsrue
Enthusiast
Enthusiast

Ran across the same issue. I looked around to see if I can figure out what is trying to delete and I found on my lab environment that I must have tried to delete the HTTP Health Monitor and then immediately after I deleted the controllers. It is just stuck now. I will open a ticket with VMWare and see if they can figure out how to fix this issue. Once I get a fix I'll post it here.

grimsrue_0-1644861888096.png

 

Reply
0 Kudos
pascal-saul
Contributor
Contributor

I thought I was the only one who opened a SR (22306475302). 

It is not failing at 75% but 85% if you keep starring at the progress. While the deployment process is doing the post configuration and is adding the cluster VIP it will fail after ~10 pings. When you deploy it for the first time in a new environment it is not failing but the result is succesfull at exactly the same point. It almost looks like there is something stuck in an internal DB but you can't see anything within the NSX Manager.

Reply
0 Kudos
grimsrue
Enthusiast
Enthusiast

I had a zoom session with a VMWare engineer last week and he agrees that the Controllers will not install because the alb-endpoint will not delete....at least that is what the engineer thinks is happening. They have a full log bundle from my NSX-T environment, including 3 ESXi hosts. The engineer could not get rid of the alb-endpoint either so he has escalated my case up to the next level NSX-T engineering team. That was Thursday the 17th. No one has reached back since Thursday.

Reply
0 Kudos
pascal-saul
Contributor
Contributor

In the log (/var/log/proton/nsxapi.log) should be something like:

2022-02-20T19:09:20.322Z INFO ActivityWorkerPool-1-19 AlbControllerNodeConfigurationTask 5551 POLICY [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] [ALB Controller] Controller configuration failed during on-boarding task in Policy. com.vmware.nsx.management.policy.advanceloadbalancer.exceptions.AdvancedLoadBalancerException: Error: An object with the same path=[/infra/sites/default/enforcement-points/alb-endpoint] is marked for deletion. Either use another path or wait for the purge cycle (max 5 minutes) for permanent removal of the object.
========================

Then force a clean-up:

POST https://<NSX_MANAGER_IP>/policy/api/v1/troubleshooting/infra/tree/realization?action=cleanup
Body (JSON) -
{

"paths" : [

"/infra/sites/default/enforcement-points/alb-endpoint"

]

}

Before and after you could check: GET https://<NSX_MANAGER_IP>/policy/api/v1/infra/deployment-zones/default/enforcement-points?include_mark_for_delete_objects=true

Reply
0 Kudos
grimsrue
Enthusiast
Enthusiast

VMWare got back to me today with the same API call that "pascal-saul" posted.

I can never get Postman to work correctly so if you want to run a curl command from Powershell this is the command that worked for me.


curl -k -H "Content-Type:application/json" -u admin -X POST https://"nsx-t Manager"/policy/api/v1/troubleshooting/infra/tree/realization?action=cleanup -d '{"paths" : ["/infra/sites/default/enforcement-points/alb-endpoint"]}'

 

Note: I have not had any time to try and do another controller install, but the alb-endpoint" enforcement point is now gone when I run

GET https://<NSX_MANAGER_IP>/policy/api/v1/infra/deployment-zones/default/enforcement-points?include_mark_for_delete_objects=true

pascal-saul
Contributor
Contributor

I did and it worked 🙂

Reply
0 Kudos
VitaliyChirkov
Contributor
Contributor

Thank you. Working solution.

 

Reply
0 Kudos
simonpenny408
Enthusiast
Enthusiast

That curl command doesn't work from powershell - throws an error so it must be incorrect - obviously I replaced the NSX manager part with the IP address

Postman won't let me enter JSON as the body type for the API call.  I think this fix needs more accurate documentation - I had 2 kits experience this on my class this week and had to raise a support ticket to clone

Reply
0 Kudos
grimsrue
Enthusiast
Enthusiast

curl is not built into older versions of Powershell. If you run curl.exe in powershell and do not get a help menu then you don't have it installed. You will have to download and install it separately. You can do that from Chocolatey. You will need to install the Choco package manager first then you can install curl.

Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco install curl -y

 

If you do have curl installed then it would help to know the error you are getting.

I can run json in the body section of Postman just fine. I have done it many times. You have to add (Key) Content_Type, (Value) application/json to the "headers" tab. I don't use postman that often though.

As a fyi there is no official documentation because this is just a "work-around" fix for a known issue that VMWare plans to fix in a later patch release. The people in this thread are not VMWare employees that I am aware of. If you want more accurate documentation I suggest asking VMWare support for it.

Reply
0 Kudos