VMware Cloud Community
jobee1
Enthusiast
Enthusiast

Two of my three vCLS are red - how do I get them back to green?

Good afternoon. We had a power-outage for about 10 seconds, after which two of my three vCLS turned red. I'm not sure what happened because as far as I can tell my host servers, my NAS storage devices, and my switches are all on UPS units. Anyway, I've tried to get them back to green but I am unable to see how to do this. I did read that as long as I have one functional vCLS I'm good, but is the fact that the other two are still read indicative of a problem? All VMs are working fine so I'm good there. 

So, is there something I need to do or should I just leave well enough alone? 

Thanks, 

Joe B

Reply
0 Kudos
10 Replies
maksym007
Expert
Expert

Do you have a screenshot maybe? Do you have any errors, or alerts? 

Option "reset to green" is there? 

Reply
0 Kudos
jobee1
Enthusiast
Enthusiast

Good afternoon maksym007, 

I just added a picture of the alert banner. Unfortunately there is no option to "Acknowledge" nor "Reset to green." I also uploaded a picture showing the two red and one green vCLS. 

There are no errors nor alerts except for the insufficient resources message, and everything else is happy. I've been working from home for the last couple of days but will be in the office tomorrow. I am going to take a very close look to see where every power cable is plugged in. Again, we had power outage for about 10 seconds on Friday and I was off over the weekend. When the power went out my coworker went into the server room to make sure everything was running. Curious... 

Thanks, 

Joe B

Reply
0 Kudos
maksym007
Expert
Expert

Thanks for the screenshots. If I were you, I would disable HA for that Cluster, delete all these vCLS 3 VMs, and enable HA again. 

I assume it might help.

A bit harsh - yes, but the result will be. 

Reply
0 Kudos
StephenMoll
Expert
Expert

Enable retreat mode and then disable retreat mode.

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-F98C3C93-875D-4570...

This will remove and replace the vCLS appliances in the most controlled way.

Reply
0 Kudos
maksym007
Expert
Expert

Ohh, even for me something new here. 
Thx for article

Reply
0 Kudos
StephenMoll
Expert
Expert

We have this technique built into our management processes. So when funnies are suspected with the vCLS they can be refreshed at the click of a button. It seems to cure all problems we have encountered where the appliances 'seem' to be sort of functioning but not behaving 100% as expected.

I think VMware are still learning about vCLS and how to make it work better. So it has a feel of being "work in progress".

We have problems with the way vCLS are placed at system startup, because there is not in my opinion enough control mechanisms available to users to control placement. It is possible for example to end up with all your vCLS in one blade chassis, in one rack or in one room. So a major outage of some sort can obliterate all the vCLS for a cluster in one go. This leaves HA recovery at the mercy of the old HA placement engine which still exists inside the hypervisors. This engine is now a little bit broken in that it no longer obeys the advanced affinity rule controls, so if you lose all vCLS there is a chance that HA recovery will be without any affinity rules being applied at all. 

Reply
0 Kudos
maksym007
Expert
Expert

One more question: 

what is your vCenter version? maybe the issues that you have are fixed in 7.0.3 latest release

Reply
0 Kudos
StephenMoll
Expert
Expert

We are on 7u3. It has been recognised as a weakness in vCLS for our use case by our TAM.

jobee1
Enthusiast
Enthusiast

@StephenMoll, thanks very much, creating that setting did the trick! 

Thanks again, 

Joe B

Reply
0 Kudos
maksym007
Expert
Expert

Thx for the feedback will notice in future how it works.

Reply
0 Kudos