Good afternoon. We had a power-outage for about 10 seconds, after which two of my three vCLS turned red. I'm not sure what happened because as far as I can tell my host servers, my NAS storage devices, and my switches are all on UPS units. Anyway, I've tried to get them back to green but I am unable to see how to do this. I did read that as long as I have one functional vCLS I'm good, but is the fact that the other two are still read indicative of a problem? All VMs are working fine so I'm good there.
So, is there something I need to do or should I just leave well enough alone?
Good afternoon maksym007,
I just added a picture of the alert banner. Unfortunately there is no option to "Acknowledge" nor "Reset to green." I also uploaded a picture showing the two red and one green vCLS.
There are no errors nor alerts except for the insufficient resources message, and everything else is happy. I've been working from home for the last couple of days but will be in the office tomorrow. I am going to take a very close look to see where every power cable is plugged in. Again, we had power outage for about 10 seconds on Friday and I was off over the weekend. When the power went out my coworker went into the server room to make sure everything was running. Curious...
Thanks for the screenshots. If I were you, I would disable HA for that Cluster, delete all these vCLS 3 VMs, and enable HA again.
I assume it might help.
A bit harsh - yes, but the result will be.
Enable retreat mode and then disable retreat mode.
This will remove and replace the vCLS appliances in the most controlled way.
We have this technique built into our management processes. So when funnies are suspected with the vCLS they can be refreshed at the click of a button. It seems to cure all problems we have encountered where the appliances 'seem' to be sort of functioning but not behaving 100% as expected.
I think VMware are still learning about vCLS and how to make it work better. So it has a feel of being "work in progress".
We have problems with the way vCLS are placed at system startup, because there is not in my opinion enough control mechanisms available to users to control placement. It is possible for example to end up with all your vCLS in one blade chassis, in one rack or in one room. So a major outage of some sort can obliterate all the vCLS for a cluster in one go. This leaves HA recovery at the mercy of the old HA placement engine which still exists inside the hypervisors. This engine is now a little bit broken in that it no longer obeys the advanced affinity rule controls, so if you lose all vCLS there is a chance that HA recovery will be without any affinity rules being applied at all.