Solved: VSAN Disk Format to 3.0 failed // Failed to realig...

KurtDePauw1 · ‎03-17-2016

Hello all,

After the upgrade to "6.0 Update 2" and trying to update the disk format to version 3.0 I received the below error message.

Failed to realign following Virtual SAN objects:

be4db256-0ab4-9801-c523-0cc47a3a34ca, c2bcb056-5d5f-0f02-f096-0cc47a3a3320, c0bcb056-70a6-180c-c583-0cc47a3a3320, e413b256-90b5-dd1a-cce6-0cc47a3a34ca, c2bcb056-64aa-ee1e-0bb3-0cc47a3a3320, c0bcb056-4b47-5529-b59d-0cc47a3a3320, 6644b256-e0f8-3338-737d-0cc47a3a34ca, d411b256-08ea-f83b-9774-0cc47a3a34ca, c0bcb056-a489-9541-efbe-0cc47a3a3320, 3f6abc56-28fd-4a48-4f8a-0cc47a3a34ce, d411b256-784e-a45a-714d-0cc47a3a34ca, c1bcb056-6671-c45c-2254-0cc47a3a3320, 3f6abc56-1061-ab60-e4c8-0cc47a3a34ce, f17bb356-fcdf-3469-502a-0cc47a3a3320, cfe8b156-606f-ee6a-4356-0cc47a3a34ce, d411b256-4450-3777-e3bf-0cc47a3a34ca, c1bcb056-b24e-cb7a-7b1a-0cc47a3a3320, f17bb356-f008-9581-2468-0cc47a3a3320, 31bdb056-0a02-4882-9094-0cc47a3a34ca, c1bcb056-2a83-a596-ed2c-0cc47a3a3320, f17bb356-08a1-529f-c375-0cc47a3a3320, c1bcb056-4edc-ffaf-bf04-0cc47a3a3320, c1bcb056-c9e2-fec8-1930-0cc47a3a3320, c1bcb056-4e92-e4e4-9b92-0cc47a3a3320, e413b256-dcd0-f6fd-8ae2-0cc47a3a34ca,

due to being locked or lack of vmdk descriptor file, which requires manual fix.

Does someone has an idea to fix this ?

CHogan · ‎03-30-2016

The KB articles and the script are now available to resolve this issue. A more permanent fix is in the works.

Details of the issue, including links to KBs and scripts can be found here - http://cormachogan.com/2016/03/31/vsan-6-2-upgrade-failed-realign-objects/

Thanks for your patience.

http://cormachogan.com

View solution in original post

Jasemccarty · ‎03-17-2016

C‌an you describe what your environment looks like?

Are you running a general workload? Virtual desktops? etc?

Thanks,

Jase

Jase McCarty - @jasemccarty

zdickinson · ‎03-17-2016

Good morning, we had something similar. I tried to vMotion a VM on a vSAN datastore and got an error. I powered it down and was able to move it to a different host, however when I powered it on; I received an error: VMDK not accessible.

When I contacted support, they said it was because the VMDK descriptor file was missing. I did not do anything to fix it, support was able to with a bit of work. See below. Thank you, Zach.

I managed to retrieve the UUID of the disk from the vmware logs of the VM:

- vmware-1.log:2015-11-10T13:24:36.609Z| vmx| I120: DISKLIB-VMFS : "vsan://f5c26155-5e58-a555-bc8d-ecf4bbcfca10" : open successful (21) size = 75161927680, hd = 0. Type 3

- vmware-1.log:2015-11-10T13:24:36.612Z| vmx| I120: DISKLIB-VMFS : "vsan://f5c26155-5e58-a555-bc8d-ecf4bbcfca10" : closed.

We clarified that this was the correct UUID with the following command and associated output:

/usr/lib/vmware/osfs/bin/objtool getAttr -u f5c26155-5e58-a555-bc8d-ecf4bbcfca10

Object Attributes --

UUID:f5c26155-5e58-a555-bc8d-ecf4bbcfca10

Object type:vsan

Object size:75161927680

User friendly name:(null)

HA metadata:(null)

Allocation type:Zeroed thick

Policy:((\"stripeWidth\" i1) (\"cacheReservation\" i0) (\"proportionalCapacity\" i0) (\"hostFailuresToTolerate\" i1) (\"forceProvisioning\" i0) (\"spbmProfileId\" \"aa6d5a82-1c88-45da-85d3-3d74b91a5bad\") (\"spbmProfileGenerationNumber\" l+0))

Object class: vdisk

Object path: /vmfs/volumes/vsan:52ce5c856108f1cb-fffcad0808c892b3/f1c26155-5ae8-5013-c1fb-ecf4bbcfca10/GVvCenter_2.vmdk

We then created a temp VM and copied the temp.vmdk to the VM directory.

I edited the newly created GVvCenter_2.vmdk so that it contained the following:

# Extent description

RW 146800640 VMFS "vsan://f5c26155-5e58-a555-bc8d-ecf4bbcfca10"

The RW value was calculated by dividing the size above (75161927680) by 512

I got the VMID of the VM by running:

vim-cmd vmsvc/getallvms

Then reloaded the .vmx file by running:

vim-cmd vmsvc/reload

KurtDePauw1 · ‎03-17-2016

The environment is a standard setup

3 servers ( Supermicro ) - same config for all 3 of them
- hybrid VSAN
  - Intel NVMe - 800GB SSD ( per server )
  - 4 X 2 TB SAS ( per server)
- 256GB memory ( per server )
- VM's
  - 12 Microsoft Windows Servers
    - Nothing Special
      - SQL, Exchange, Terminal server, etc etc

Jasemccarty · ‎03-17-2016

I'm wondering if you have any unassociated objects.

What do you see in the RVC when you run a vsan.obj_status_report ClusterName -t -u command?

Jase McCarty - @jasemccarty

KurtDePauw1 · ‎03-17-2016

Confirmed as a bug by vmware technical support.

Keep you posted when a sollution is available

MichaelGi · ‎03-18-2016

Any news on a solution? I' having the same issue.

KurtDePauw1 · ‎03-19-2016

The only news I have for now is ....

Hello Kurt,

I'm the VSAN Escalation Engineer from the VSAN Team.

Wanted to tell you that our Engineering Team has root caused the issue and are working on a fix.

We will keep you updated in the coming days.

If I have news .... I will share it.

alienjoker · ‎03-19-2016

I managed to fix the issue I had which came largely down to the presence of AppStacks within AppVolumes, although note my setup is a lab environment so exercise caution if you decide to proceed:-

http://www.acmcomputers.co.uk/?p=240

Best regards

Andrew

MichaelGi · ‎03-19-2016

Thanks for the help. Mine seem to be coming from 3 different production VM's. I'll wait for a solution and maybe open a case with GSS.

AlexanderLiucka · ‎03-22-2016

Hi manumixx,

Workaround is to put the hosts one by one in Maintenance with full data migration, manually remove the disk groups and recreate them.

douglasarcidino · ‎03-22-2016

Not entirely true. My home datacenter built on an HP Bladesystem had this exact issue. The locks prevent you from performing the data migration. Now, it's possible that some of the storage policies I was playing with contributed to this issue but I have only just began collecting the logs. I'm planning on submitting them to VMware for the purpose of root cause.

The solution, which may not work for people with production data, is to find the objects that are holding up progress, power them off and then perform the upgrade. You may be able to power the VMs back on aftert the migration starts but I didn't try that as I have not had a backup of my environment since I started the VSAN 6.2 upgrade.

If you found this reply helpful, please mark as answer VCP-DCV 4/5/6 VCP-DTM 5/6

alienjoker · ‎03-22-2016

Hi,

I've been monitoring the array of problems people have been having on this matter and I've picked up a response in my digging from Cormac Hogan on this:-

"It looks like there are 2 upgrade issues. The first is associated objects where an rm -r may have been run against files and folders on the VSAN datastore. This leaves the VMDK objects stranded. Since we won’t ever delete an object automatically, we need admins to either recreate the objects or remove them completely.

There is a second issue, and this seems to be related to CBT – Change Block Tracking. This is not specific to any application (AppVolumes, View, VCD). But if you are backing up or replicating the VMs using CBT, these get locked. We are working out the best way to deal with this automatically (probably KB article plus an attached script to automate the handling)."

In the mean time, I've managed to fix three different upgrade attempts (each with their own problems) as per the following articles (note these are homelabs):-

http://www.acmcomputers.co.uk/?p=258

http://www.acmcomputers.co.uk/?p=252

http://www.acmcomputers.co.uk/?p=240

Best of luck in your Upgrades - it'll all be worth it in the end!

Andrew

MichaelGi · ‎03-22-2016

The workaround for me was to find the virtual machine with the lock then clone it and delete the original.

AlexanderLiucka · ‎03-22-2016

Very strange. I have same error like you

Failed to realign following Virtual SAN objects:

1f3f8856-9d12-4806-e14d-002590358112, 730a8b56-3a9b-6009-334f-002590358112, e3318c56-47cb-5711-1f42-002590358176, 511d8c56-7624-d513-79ec-002590358112, c8128b56-f409-fb17-c3f4-002590369a20, 950e8c56-e94b-d319-a361-002590358112, ce228c56-de04-a21f-0efe-002590369a20, f2578856-3642-ce3c-2555-002590369a20, 65f98a56-6c8c-bb47-3ccf-002590358176, fb128b56-8ae5-6d50-f32c-002590358112, 53ef8b56-4822-aa51-a493-002590369a20, af908856-9259-a65c-ac24-002590358176, 4a0e8b56-6d4b-e15c-39bd-002590369a20, 9fd38b56-c2f1-5d5e-216e-002590358176, 1ce88756-00f5-8763-51b4-002590358112, cac58a56-2aa9-846a-1a4e-002590358176, afb28a56-ac29-de6d-b529-002590358112, 5be58756-866b-ae6e-4b2b-002590358112, 56ef8a56-a880-086f-0647-002590369c08, 4af98a56-2ed8-d272-95fd-002590369c08, af908856-e2d7-7e73-8482-002590358176, 6be68b56-4f8d-c97f-9615-002590358112, af908856-ace3-a288-043f-002590358176, d3d48956-5273-b58b-2f62-002590358176, 1de88756-9f0d-ba8e-a545-002590358112, 6df98a56-7768-0d91-9d48-002590358112, d3e48a56-1698-6091-2451-002590358112, 66f98a56-5a83-3b95-5552-002590358176, 269a8956-26d6-9198-eafe-002590358176, 54ef8a56-e4e8-e999-db49-002590369c08, 06018b56-caef-1ea4-b5cb-002590358176, a42f8c56-b6ef-96ab-616b-002590358176, f9598856-9ec8-09c3-39c7-002590358176, 7c1b8c56-5ad9-79c3-9929-002590369a20, cac58a56-93e4-9ec3-e932-002590358176, 9c0a8b56-60c2-6fd2-d3db-002590369a20, 61fa8a56-d37a-8fd5-cb7a-002590358112, f9598856-b668-e8da-2060-002590358176, cac58a56-5ebc-7edd-eb7f-002590358176, 47028956-ea87-73e7-9f58-002590358112, fa598856-ca9b-7df2-9aa1-002590358176, 279a8956-96e4-82f9-64a6-002590358176, 628e8856-aeff-9fff-b981-002590369a20,

due to being locked or lack of vmdk descriptor file, which requires manual fix.

I have 6 host in vsan. with my suggestion - maintenance mode with full data migration and manually recreate disk group, I have upgraded 2 host till now. At the moment i'm putting 3rd host in maintenance.

If I reach some host that can't migrate the data I will tell you, but for now all is working normal.

douglasarcidino · ‎03-22-2016

If anyone at VMware would like them, I have my logs from my environment that I am willing to send up. I use EVAL Experience licenses so obviously, no support. I have also fixed my own issues but I am offering the logs to help VMware better understand how the product is working and to see if there is anything helpful they can get. As a VMware partner, I like to help improve the product because I will have clients using this in production soon. Please PM me if you would like them VMware Virtual SAN‌

If you found this reply helpful, please mark as answer VCP-DCV 4/5/6 VCP-DTM 5/6

KurtDePauw1 · ‎03-22-2016

The logs have been send to vmware.

they have found the root cause but still no fix for it unfortunately.

Still waithin for an answer from VMware

srodenburg · ‎03-24-2016

Be careful when upgrading and analyze the type of objects that the upgrade tool complains about. We had a disastrous upgrade which i posted on Cormac's Blog. I wanted to share it here as well, just for reference.

_______________________________

Hi Cormac,

About that FS 2.0 to 3.0 Upgrade with no impact to running VMs…

We have a 6 node cluster. We tried for more than a day to get this to work (after upgrading vCenter and all 6 hosts to U2) and we always ran into locking issues with running vm’s. Only when a VM was powered off, was the lock gone. As soon as the VM started, a lock was back and the upgrade refused to work because of these locks. Kind of like a dog chasing it’s own tail.
We used RVC to establish that all the component ID’s that the “upgraded-failed-message” talked about (a very long list…) where normal VMDKs. So no AppVolumes stuff or left-overs lying around. It found that all the VMDK’s (every VMDK of every VM) was causing a lock issue and refused the update as a consequence.

We never got out of this vicious circle. In the end, we attached a NAS via NFS, storage-migrated the vCenter VM to the NAS because it’s own presence as a running VM caused a lock issue (with itself).
Then we powered off all other VM’s, which made the locks go away (and did not start them to avoid a new lock). Only then did the upgrade run because nothing was locked anymore.

The upgrade ran almost 48 hours for a 6 node cluster with 4x 600 10k SAS drives each (capacity tier). We were down for the entire weekend.
_______________________________

Cormac replied with:

Sorry to hear that Steven. It looks like there are 2 upgrade issues. The first is associated objects where an rm -r may have been run against files and folders on the VSAN datastore. This leaves the VMDK objects stranded. Since we won’t ever delete an object automatically, we need admins to either recreate the objects or remove them completely.

There is a second issue, and this seems to be related to CBT – Change Block Tracking. This is not specific to any application (AppVolumes, View, VCD). But if you are backing up or replicating the VMs using CBT, these get locked. We are working out the best way to deal with this automatically (probably KB article plus an attached script to automate the handling). This will mitigate the manual effort that you had to go through. We’ll then get a patch out to take care of this automatically. I’ll provide an update as soon as I know more.

And i replied back:

CBT ? Hmmm. We used to use CBT with Veeam. Until the CBT bug that was in pre ExpressPatch 4 / U1b. We stopping Veeam from using CBT all together but there are still a lot of CBT files laying around.
The first issue was true for one stranded file after a storage-vMotion off the vsanDatastore to a NAS. But that we found and cleared up quickly.
So i’m betting my money on your CBT suggestion.

CHogan · ‎03-24-2016

We're just finalizing the testing on the scripts folks.

The issue is well understood, so as soon as I can, I'll share an update with you.

Thanks for your patience

Cormac

http://cormachogan.com

Bill_Oyler · ‎03-30-2016

We are also receiving the same error message in our upgrade from VSAN 6.1 (v2) to VSAN 6.2 format (v3):

Failed to realign following Virtual SAN objects: 1ccdfa56-cd13-4b02-612c-bc305bf7dc40, 366f8055-7b32-292e-ec13-bc305bf7de10, 6d718055-cdd8-a141-c448-bc305bf7de10, aacefa56-7e6e-cf53-b719-bc305bf7de10, 76d9fa56-656d-6a59-2fe3-bc305bf7e010, e9bafa56-f85c-a78c-0d61-bc305bf7dc40, 827b6755-b4a4-af8d-a8f2-bc305bf7e010, 46fb4556-957a-f196-cb28-bc305bf7dc40, 66fa3f55-d99e-7fa9-0eaf-bc305bf7dc40, adb3fa56-bcc0-f7d8-09f0-bc305bf7e010, eb8adb55-9df3-50e8-dcb1-bc305bf7e010, ee8adb55-1c86-83ed-726d-bc305bf7dc40, due to being locked or lack of vmdk descriptor file, which requires manual fix.

"Failed to migrate vsanSparse objects on cluster"

Looking forward to a fix that doesn't involve manually deleting lots of data. We are using VMware Horizon View, App Volumes, and a handful of other VMs on VSAN.

Thanks,

Bill

Bill Oyler Systems Engineer

All

VSAN Disk Format to 3.0 failed // Failed to realign following Virtual SAN objects // due to being locked or lack of vmdk descriptor file, which requires manual fix.