A general system error occurred: Operation failed ...

carvaled · ‎02-21-2018

Hello!

I did hit up Dr Google... but couldn't find this specific error.

Long story short I was doing some shady stuff on my vSAN home lab (3 node)... I had no patience did some quick and dirty steps and now I am getting the following error when trying to get the host to enter Maintenance Mode (regardless of the evacuation method).

A general system error occurred: Operation failed due to a VSAN error. Another host in the cluster is already entering maintenance mode.

Retry operation after it completes. Cause: General vSAN error.

I did restart hostd and vpxa but that didnt appear to help.. and ideas? I am about to bound the vCenter but was wondering if anyone had seen this before.

Cheers

vMan

TheBobkin · ‎02-21-2018

Hello vMan,

Try:

localcli vsan maintenancemode cancel

Then try re-applying e.g.:

localcli system maintenanceMode set -e true -m noAction

Check the Decom state of the other nodes in cluster:

# cmmds-tool find -t NODE_DECOM_STATE -f json

These should all be "content": {"decomState": 0, if they are not in MM or not entering MM - do note that there can be stale node entries in CMMDS so if any of them show as decomState 4 or 6, check whether the node UUID is actually a current node UUID in the cluster.

Bob

carvaled · ‎02-21-2018

thanks Bob,

I tried it, I did have a 4th host decom state 6.

I ran

cmmds-tool delete -u UUID

.... on all 3 vSAN hosts for the host that did not exist... but I still get the same error when i try to put it into MM.

same from the command line.

[root@esx2:~] localcli system maintenanceMode set -e true -m noAction

Errors:

A general system error occurred: Operation failed due to a VSAN error.

would i need to reboot the hosts?

carvaled · ‎02-21-2018

for those interested, i ended up rebooting all 3 vSAN hosts and the issue went away.

cheers

vMan

TheBobkin · ‎02-21-2018

Hello vMan,

Generally with vSAN I wouldn't advise the old 'turn it off and on again' (as you know..pulling the storage with it!) but alas in some states a refresh of all the things that are loaded does the job.

Out of interest what kind of "shady stuff" we talking here? :smileygrin:

I mostly use HOL for breakage/repro/carnage but anything running on Physical/VMware WorkStation seems to be more reliable/consistent in the results (of thoroughly unsupported workflows).

Bob

carvaled · ‎02-21-2018

yeah i totally understand and to be honest i would never do this outside my home lab...worse case i would have just rebuilt it.

So I was upgrading ESXi from 6.0 to 6.5 using Auto Deploy on my unsupported white boxes (i7's on some old ASUS motherboards), running "community supported" drivers for storage and network adapters... then i added the host back into the vSAN cluster and created the diskgroup while the host was not in Maintenance mode so a resync kicked off... i then tried to stop it by putting it into MM so i could update it from update manager... it was hanging due to vmotioning VM's... so then the MM timed out.... after that i tried a few times and always a MM timeout... so i dragged it out of the cluster and then added it back in again... thats when i started to get the new MM error i posted about...

yeah i know what a mess, but hey i was in a rush ... but its honestly a testament to vSAN.. i did some very nasty stuff to the cluster while it was running VM's and it didn't skip a beat!

Its all running now, updated and the host is re-syncing so im happy.

Cheers

vMAN

All

A general system error occurred: Operation failed due to a VSAN error. Another host in the cluster is already entering maintenance mode.