I've begun the process of updating my vsan format from v7 to v13. Almost immediately, I got messages about being close to running out of disk space in my vsan datastore, and also got this error on one of my cache drives:
Observed excessive log congestion, data evacuation is complete
Remove the diskgroup from vSAN and add it back
I have 6 hosts, each with 1 ssd and 6 hdds, and a disk group per host. The above error is pointing to the ssd on one of the hosts.
Does this mean:
1. That my disk group is now not adding capacity to vsan, and I can simply remove it and add it back with no impact?
2. That if I remove/re-add this group I'll lose this significant amount of storage, which *will* cause me to run out of disk space?
3. Something else?
I've searched everywhere, but I can't find an answer. I've created a ticket, but had to call it sev 2 since I'm not technically "down".
I believe this is effect of DDH
Dying Disk Handling is a method that vSAN uses to check the health of disks/diskgroups in order to detect an impending disk/diskgroup failure or a poorly performing diskgroup due to congestion.
I suspect the DG was unmounted automtically as vSAN detected congestion on the SSD cache tier and thus it was not contributing to capacity. more reading here
There may have been congestion due to movent of VMs or objects after the upgrade, this may have lead to congestion on a particular disk-group.
Another posibility of movment of data is large VMDKs may have been reformatted after doing the object conversion which may have triggered a resync or movement of data
I would suspect putting the affected host in MM mode , deleting affected DG and re-creating it , and letting the objects to resync back will probably help.
vSAN also has a protection mechanisim in case a resync fills up a diskgroup and will suspend reysncs if it runs low on space.
Prob leave to support to advise you best though as they will prob want a closer look.