I have 5 esxi 5.5 servers with 20TB of VSAN . Suddenly one of VMs got soft lockup error . i tried to reboot it but it freezed . after some investigation i found that one 1TB hdd has failed . I did not worried because of vsan .
But now VM could not start . in the vsan datastore we can see that the 1TB vmdk of the vm is lost !!!!
Please take a look at attachments .
The problem seems to be double fault has occurred. and your policy can only tolerate 1
meaning, both the RAIDO copies(atleast one in each) of the RAID-1 are in bad state. ( your vm storage policy image confirms this)
you should check why the other RAID0 component was down. this is from a different spinning disk (disk UUID can be found in the vm storage policy image)
Yes but look at the first RAID0 object, there are 5(!) components (hard drives) missing there. At least that is what VSAN can "see". So i suggest to open a support ticket with vmware. I am pretty sure the first RAID0 object is some bug which could eventually be cleared out. I don´t believe 5 hard drives failed at once...
Yes you find what i tried to say .
Also other VMs which uses those hard drives , does not have any problem .
Unfortunately we don't have support contract to send ticket . Any other way ?
You *need* to have a service contract - at least when using critical vm´s on it.
Now - IF i had this in a testlab (and ONLY in a testlab). And all in the testlab would be totally for fun and play. Only THEN I would EVENTUALLY shutdown all the running VMs residing on the VSAN and after that I would shutdown all hosts participating in the VSAN cluster. After that i´d boot em up again and check if the error persists. It it would persist i would EVENTUALLY think about disabling VSAN and re-enabling it again.
But this is only trial and error. This is NO advice to you. It is all hypothetical. So don´t do this!
I´d *really* suggest to implement a fresh contract / renew yours and then open an official case with vmware.
sorry you misunderstood what i mentioned
what am saying is in this case, there is two failures, resulting in both copies unusable, we don't handle double faults and the VM cannot be recovered in such cases
now i agree with you on the other front that we need to know why and how this condition occurred to begin with and if there is a defect here we need to address.