Good day.
There is a vSAN cluster with 3 hosts (esxi-1, esxi-2, esxi-5).
After freeze and reboot one of the hosts (esxi-5) I have 3 inaccessible VM and 7 inaccessible objects. All the objects have link to unknown capacity disk UUID.
After searching info in logs I'v found that this unknown capacity disk UUID was earlie on one of the capacity disks. Now it is changed.
Is it possible to manually change component disk UUID?
Inaccessible objects:Object summary:
Devices names and UUIDs
esxi-1
1 Device: naa.50014ee0aed0a4f1
2 Device: naa.50014ee004267364
3 Device: naa.5000c500a408519f
4 Device: t10.NVMe____Samsung_SSD_970_PRO_1TB
5 Device: naa.50014ee213b45c75
6 Device: naa.50014ee00418dc4f
7 Device: naa.50014ee0aed13061
8 Device: naa.50014ee6066bd906
1 VSAN UUID: 5210573d-e816-d108-9b7d-9b45af86fbc8
2 VSAN UUID: 521cf0f2-77db-7bb1-7bd4-45611763eed1
3 VSAN UUID: 5230c047-a3ca-8d41-638f-0f69ae922a7f
4 VSAN UUID: 524b625a-31dd-dd28-a051-5a2105194a66
5 VSAN UUID: 5250b0ac-7aaa-faa1-9fba-7a5fb5dfc82c
6 VSAN UUID: 528a38a0-5862-0a4a-8935-a215e9a6029b
7 VSAN UUID: 52ad47c2-d86d-ac79-7242-63c35799c077
8 VSAN UUID: 52f9bba5-d665-f168-fe1c-7e93bf50550d
esxi-2
1 Device: naa.50014ee0597b9571
2 Device: naa.5000c500a407e9ab
3 Device: naa.50014ee0042675ef
4 Device: naa.50014ee213b45db3
5 Device: naa.5000c500a407e844
6 Device: naa.50014ee0aed13b34
7 Device: naa.50014ee0597b8d34
8 Device: t10.NVMe____Samsung_SSD_970_PRO_1TB
1 VSAN UUID: 520cc0da-e1ba-d145-fed5-24804768aab0
2 VSAN UUID: 523b1798-9965-2aff-9fcb-c1371c40a036
3 VSAN UUID: 52643015-388c-6f7b-9de4-d97e4164023a
4 VSAN UUID: 528df8c2-0f7e-8225-9d48-843a23301ccf
5 VSAN UUID: 528f6cb0-1b49-8b69-a5bf-76ac9805361e
6 VSAN UUID: 52a29813-12a5-11e1-2743-a0794aa13e37
7 VSAN UUID: 52c21ced-6be9-540b-666e-fadd51d9657d
8 VSAN UUID: 52ed6a0c-9b26-304c-8bf6-395ee2352136
esxi-5
1 Device: t10.ATA_____6N3JZSS04
2 Device: t10.ATA_____15W0RKDVA
3 Device: t10.NVMe____Samsung_SSD_970_PRO_1TB
1 VSAN UUID: 5242a099-4c29-154e-6734-5007c195a56f
2 VSAN UUID: 52657cd9-b4ca-a1fe-5ce3-969aa4d57a26
3 VSAN UUID: 529d5fc5-7cb4-e71a-2fa6-c9639a0638b7
Disks info before reboot:
@iFFgen,Sorry, but I sincerely hope this is a homelab as Samsung 970 are far far far from a supported vSAN Cache/Capacity-tier device and you should not be storing any data you need on such a cluster (unless you are not using them for vSAN and you listed them for some other reason?).
The only way that a vSAN Capacity-tier device UUID changes is if it is wiped and new vSAN-partitions put on it (e.g. it was removed and re-added to a Disk-Group as a new blank disk), no, there is no way to change a disk UUID to another UUID and besides this won't bring back the data if it was wiped/corrupt and reformatted.
Was all data stored as FTT=1 or some stored as FTT=0? If the former and everything was compliant with such a policy then you shouldn't have anything inaccessible (and you don't have a second problem here that you have not noted), if the data was FTT=0 and that disk was removed/reformatted (which is a manual thing, vSAN will never do this without human/scripted intervention) then probably that data will remain unavailable.
I know that it is far from ideal installation. It is home lab, but one of VMs is a critical needed. It was placed here for some time. And backup restoring with error.
Is any chances to restore it? Maybe getting official VMware contract would help?
Or using hardware not from VMware Compatibility Guide is a reason for refusal?
-
@iFFgen , While there *likely* is some provision in the EULA for opening a Support Request to this regard (e.g. needs to be a supported configuration for us to be able to provide support), I myself have never turned anyone away for using unsupported devices - that being said, this is always done on the understanding that we do what we can to get data accessible again and that it then needs to be moved off this or unsupported devices replaced with supported ones (e.g. we can't really keep bandaging it together again when the source problem is clear).
Note as well that vSAN GS are not a data-recovery company and depending on the state of the remaining data and/or devices they reside on, there may be little or nothing we can do to help.
Can you please run the following on any ESXi host and attach the output here:
# esxcli vSAN debug object list --all > /temp/objout
(Omit the --all if on a version lower than 6.7 U3)
Attached output file.
Maybe there is a method to edit component's belonging to the disk - to change component's disk UUID?
Here on screen there is component wiht "old" UUID of one of the disks:
For some unknown reason one disk changed UUID and have a new one. And components of 7 virtual objects refer to old UUID. Disk was not changed, replaced or removed. It was all the time in its bay and still is there.
@iFFgen , Sorry to have to repeat this but changing a disks UUID (which isn't possible other than maybe with very breaking-stuff-level-hackery) isn't going to magically make the lost data appear on that disk - the data belonged to basically what is the past-life of that disk (assuming what you said regarding to new UUID is true).
As I said, if a disk has a new vSAN UUID then to vSAN and the Disk-Group this is a part of, this was a claimable blank disk that has now been added to the Disk-Group as a blank disk - do you have auto-claim enabled? Asking as that is another possibility over human intervention (but still, vSAN would have had to see it as a blank disk with no partitions for that to occur).
Regardless, looking at the inaccessible Objects it is clear that they were not in a redundant state when the failure occurred (e.g. FTT=1 with 2 failures) - the data components on esxi-5 were being resynced from the data stored on esxi-1 (e.g. in the below it had synced 8.66GB of 58.61GB) when the only full remaining copy of the data stored on esxi-1 was lost due to (presumably) hardware failure on esxi-1.
Object UUID: e3c79760-3c90-201c-014a-0cc47a6cd44e
Version: 10
Health: inaccessible - Lost quorum.(APD)
Owner: esxi-2
Size: 0.00 GB
Used: 8.67 GB
Policy:
Configuration:
RAID_1
Component: e3c79760-ad57-f91c-5a95-0cc47a6cd44e
Component State: ABSENT, Address Space(B): 108447924224 (101.00GB), Disk UUID: 52a32d00-2307-8dd5-8c14-90604418bc75, Disk Name: N/A
Votes: 1, Host UUID: None
Component: 246b9a60-46ac-46ff-4de5-0cc47a6cd44e
Component State: RECONFIGURING, Address Space(B): 108447924224 (101.00GB), Disk UUID: 52657cd9-b4ca-a1fe-5ce3-969aa4d57a26, Disk Name: t10.ATA_____Hitachi_HDS723020BLE640_______________________MS77215W0RKDVA:2
Votes: 1, Capacity Used(B): 62935531520 (58.61GB), Physical Capacity Used(B): 9302966272 (8.66GB), Host Name: esxi-5.xxxxxxxxxxxxx
Witness: 246b9a60-6ed3-48ff-c622-0cc47a6cd44e
Component State: ACTIVE, Address Space(B): 0 (0.00GB), Disk UUID: 528f6cb0-1b49-8b69-a5bf-76ac9805361e, Disk Name: naa.5000c500a407e844:2
Votes: 1, Capacity Used(B): 12582912 (0.01GB), Physical Capacity Used(B): 4194304 (0.00GB), Host Name: esxi-2
Type: vdisk
Path: /vmfs/volumes/vsan:528c37b83342ce2a-6b0ecc1731c834e9/e2c79760-8447-b999-d99d-0cc47a6cd44e/Test-1CRent-De.vmdk (Exists)
Group UUID: e2c79760-8447-b999-d99d-0cc47a6cd44e
Directory Name: N/A
I am going to hazard an educated guess and assume the data in FS_1.vmdk is what is valuable to you here and I will say from looking at the remaining components that full recovery of this will not be possible as there is not a complete copy of the data (and the reconfiguring/partial components are only a fraction of the data) - best-effort recovery attempt of the data from the rest of the components (e.g. minus the ~158GB of data that was on disk with UUID 52a32d00-2307-8dd5-8c14-90604418bc75) would at best result in partial data usable and at worst none of it useable (depends on too many factors to say). That being said, if you have no backup recovery plan, can't engage Kroll or someone to see what is on that disk and whether it is usable (assuming it is not over-written partitions/blank) then do open an SR with us in vSAN GS.