VMware Cloud Community
vmsysadmin20111
Enthusiast
Enthusiast
Jump to solution

VCF upgrade to 4.2 fails during "Configuration drift bundle for 4.2.0 update" stage

Hi all,

trying to upgrade VCF to 4.2 from 4.1, the SDDC manager upgrade completed but now fails at the next step - 'configuration bundle drift update".

It looks like it's trying to update the cluster HA config and fails with "VSPHERE_HA_UPDATE_ISOLATION_RESPONSE_FAILED Failed to set vSphere HA Isolation Response for VM(s)" error (full log is attached).

Any help would be appreciated.

Reply
0 Kudos
1 Solution

Accepted Solutions
vmsysadmin20111
Enthusiast
Enthusiast
Jump to solution

It turned out that one of the Edge VMs somehow got removed from the "VM Overrides" configuration section in vCenter (on the cluster level). Once all Edge nodes were added to the "VM Overrides" with "vSphere Restart Priority" set to "high", the upgrade was able to get past this error.

View solution in original post

29 Replies
shank89
Expert
Expert
Jump to solution

Have you had a look for more information in the LCM and operations manager log?

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
vmsysadmin20111
Enthusiast
Enthusiast
Jump to solution

Unfortunately, I don't see anything relevant in those logs (attached).

I've noticed that after the upgrade the SDDC manager started to display "Retrieving tasks list failed. Something went wrong. Please retry or contact the service provider and provide the reference token." message.

Another error in the Security->Password Manager tab - "Failed to get tasks data. Something went wrong. Please retry or contact the service provider and provide the reference token."

Filtering for ERROR|WARN in the operationsmanager.log shows some errors related to password update?

2021-02-12T22:10:47.554+0000 ERROR [vcf_om,bd3bf5535c359fbc,8be1] [c.v.v.p.service.RestModelTranslator,http-nio-127.0.0.1-7300-exec-3] Diagnostic Message JSON parsing failed, Error : No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManagerErrorCode.PASSWORD_UPDATE_CSS_PASSWORD_TEST_FAILED

 

 

Reply
0 Kudos
vmsysadmin20111
Enthusiast
Enthusiast
Jump to solution

I should also add that the vCenter shows a cluster reconfiguration error, so it's probably not related to the password manager issues.

 

vmsysadmin20111_0-1613168926533.png

 

Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

I would try rebooting SDDC Manager or restarting domainmanager/operationsmanager and commonsvc.service services.  See if it gets rid of those UI errors for a start.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

It'll likely be part of the upgrade process that is failing.  What did you get in the in the logs when the task fails.  There is usually a java exception around the time of failure and some more details.  It could also be in the domain manager log.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
erikgraa
Enthusiast
Enthusiast
Jump to solution

I have just upgraded SDDC Manager to 4.2, but I am getting "No available updates" when I select vCloud Foundation 4.2 under available updates for the Management Domain. I was expecting to install the configuration drift bundle since I am upgrading from vCF 4.1.0.1. The bundle has been uploaded/validated. There is also a (1) and "READY" indicating that there is an update waiting, but it won't show.

Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

If you want to make complete sure there is nothing and its not just an SDDC Manager thing, you can try and find all offline bundles applicable to your environment.

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2/vcf-42-lifecycle/GUID-8FA44ACE-8F04-47DA-845E...

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
erikgraa
Enthusiast
Enthusiast
Jump to solution

I did do an offline bundle download, 98G worth. I can see the downloaded/validated 4.2.0.0 drift bundle. Wondering how to proceed. Might just re-install to be honest.

Reply
0 Kudos
erikgraa
Enthusiast
Enthusiast
Jump to solution

Coincidentally I have a different environment that has Internet access thru proxy, and I have no such problems there, i.e. the configuration drift bundle appears just after upgrading SDDC Manager to 4.2.0.0.

Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

Yep, hence my suggestion to attempt an offline package sync to see if it is available that way.

 

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
erikgraa
Enthusiast
Enthusiast
Jump to solution

The environment failing is offline and I downloaded all the packages from a fully updated 4.1.0.1 installation to no avail. The first update bundle gets applied, whereupon "no updates available". Can't see anything funny in the logs either. Ah well, I have gotten used to re-deploying, just takes a few hours now that I made scripts for it. 😛

Reply
0 Kudos
yankinlk
Enthusiast
Enthusiast
Jump to solution

This issue may be related to installing 4.1 with a single VDS. There is a known issue in the release notes for vcf on vxrail that (attempts badly) to explain a work around... 

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2/rn/vmware-cloud-foundation-on-dell-emc-vxrail...

I would prefer not to have to redeploy the entire stack for this - is there someone that can help? I have deployed this twice and got the same exact result. 

@HCIdiver
Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

Have you opened a case with support ?

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
yankinlk
Enthusiast
Enthusiast
Jump to solution

Sure I have... but please can someone read those release notes? There is a detailed non working fix there.... or at least its not working for me!

This thread is the first hit on Google and its so far only saying redeploy!

@HCIdiver
Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

I've read it, is the process not working for you, are you running into an issue or is the port group not being updated?

Well need more information.

 

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
yankinlk
Enthusiast
Enthusiast
Jump to solution

The process below detailed in the release notes doesnt work.  Step 1 does not create a file "vds.json". I tried creating the file in step 2...and running the curl command in Step 5... no joy.  

  1. SSH to SDDC Manager VM and run the following CURL command to retrieve existing vDS information:
    curl 127.0.0.1/inventory/vds | json_pp
    The command retrieves information on all vDSes available in the inventory and saves it in a JSON file (vds.json).
  2. Copy the management cluster vDS information from the JSON file to a new JSON file (for example, vds-updated.json).
    1. Prepare information for AVN specific VLAN port groups as shown below.
      {
      "vlanId": 1008,
      "name": "sfo-m01-cl01-vds01-pg-uplink01",
      "mtu": 0,
      "type": "EARLY_BINDING",
      "standbyUplinks": [
      "uplink2"
      ],
      "activeUplinks": [
      "uplink1"
      ],
      "transportType": "PUBLIC"
      },
      {
      "transportType": "PUBLIC",
      "activeUplinks": [
      "uplink2"
      ],
      "standbyUplinks": [
      "uplink1"
      ],
      "type": "EARLY_BINDING",
      "mtu": 0,
      "name": "sfo-m01-cl01-vds01-pg-uplink02",
      "vlanId": 1009
      }

     

  3. Add the AVN specific VLAN port group information to the JSON file saved in step 2 (vds-updated.json). 

  4. Run the following command to populate the inventory with AVN specific VLAN port groups:


    curl -X PUT -H "Content-Type:application/json" --data @vds-updated.json 127.0.0.1/inventory/vds/
@HCIdiver
Reply
0 Kudos
shank89
Expert
Expert
Jump to solution

What happens when you run the command in step 5 after creating the json?  Also need to be careful of a improperly formatted json.

If for whatever reason you are not able to get a file from the output, you could also try a tool like postman that displays the output and it can then be saved as a JSON, keeping formatting intact.

Shashank Mohan

VCIX-NV 2022 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
LinkedIn https://www.linkedin.com/in/shankmohan/
Twitter @ShankMohan
Author of NSX-T Logical Routing: https://link.springer.com/book/10.1007/978-1-4842-7458-3
Reply
0 Kudos
yankinlk
Enthusiast
Enthusiast
Jump to solution

Thanks for the offer of Help.  The PUT command listed in the release notes definitely is mistaken. I have it reported to VMware support at this stage so hopefully an update will be issued soon. The problem only occurs with single VDS deployments - i have it tested on 3 Clusters at this stage and the fix is to add the missing info directly into SDDC manager database. 

I wont publish that work around here as it is not for the faint of heart! Open a ticket, refer to the VDS issue in the release notes and they will fix it.

@HCIdiver
Reply
0 Kudos
MatthewRitchart
Contributor
Contributor
Jump to solution

If anyone runs into this issue, here are the updated commands that I have successfully tested and validated at multiple VCF sites. This will affect any site upgrading from earlier version to 4.2 where the AVN networks were deployed on a single VDS on VxRail. With this fix we're looking to add the uplink port groups (01 and 02) that were created automatically by SDDC manager to support the AVN Edge uplinks;

  1. The first command to pull the VDS info from inventory will return all the VDS’s in the system, including both MGMT and the WLDs. Copy the configuration to a notepad and edit externally to make it easier.
  2. You need to edit the JSON to only include the VDS info for the MGMT domain (remove all of the workload VDS information).
    1. Then add the new content as listed in the VMware release notes to add the portgroup info for the NSX-T Edge Cluster uplinks. This goes in the portgroup section of the JSON.
    2. Remove the square brackets [ and ] from the beginning and end of the JSON. The JSON should start and end with { and }.
    3. Copy the id for the MGMT VDS as it’s needed for the Put command.
    4. "vi vds-update.json" to create a new JSON file, paste the MGMT VDS configuration into the new json file and ":wq"
    5. Set the rights on vi vds-update.json; "chmod 755 vds-update.json" (may not be necessary, more of a just in case)
  3. The syntax for the PUT command needs to include the VDS ID for the MGMT VDS. Here is the correct syntax:

curl -X PUT -H "Content-Type:application/json" --data @vds-update.json 127.0.0.1/inventory/vds/[MGMT VDS ID]