vSAN 6.5
4 Node Cluster
Each node has 1 SSD and 2 Capacity disk
FTT=0
RAID=1
Had a disk failure in node 4, Did we lose data? or with RAID=1 we have the mirred it to another host? Please help.
Hello piseema
If it is FTT=0 then it is not mirrored/replicated in any form - single data component (depending on size), no copies, no witness component. If you lost this then it is gone.
Don't store anything you care about as FTT=0, restore what you lost from back-up, if you don't take adequate back-ups then consider this from now on - regardless of FTT=0 or FTT=1, always have back-ups regardless of the platform if you care about the data and cannot afford to lose it.
What happened to the disk - Is it actually physically failed? You could attempt putting the host in Maintenance-Mode (though if you have a lot of FTT=0 data do consider the implications of this) and rebooting it in case it is a driver/firmware/controller issue, potentially entering MM, powering off and re-seating the drive might breathe some life into it but of course make sure you are affecting the correct drive.
Bob
Hi Bob, Thanks for responding.
yes, a physical drive failure and unfortunately no backup so data loss
Hello piseema,
In that case, you should start checking inventory to confirm what you have lost - any VMs that won't power on etc.
Is this a Hybrid cluster or All-Flash cluster?
Depending on what was lost and how important this is, recovery of failed mechanical drives is in some cases possible (at a price and depending on what internal component failed of course), so consider calling Kroll OnTrack or similar if this data was critical (which I assume it was not if being stored as FTT=0 with no back-ups).
Do try the steps I suggested previously though (with the advised cautions), potentially that drive has some life in it .
Bob
Yes, absolutely, we are in process of changing the policy to FTT=1.
It is all flash cluster and thanks for the great advise, appreciate it.
Hello piseema,
I am glad I could help clarify the situation - I hope you didn't lose anything that can't be rebuilt.
Just a note though:
If you had/have a large proportion of FTT=0 data that are changing to FTT=1, do this in a measured approach e.g. don't apply to all Objects at once, do it for a few VMs/Objects at first and monitor the effects of the resync (as this can add a lot of IO which can cause contention). If it is having negligible impact then apply to a larger set of VMs/Objects. Be aware of how much free capacity you have in the cluster as changing from all FTT=0 data to FTT-1 data will obviously double your capacity usage.
Bob