Discord
Contributor
Contributor

Remove vSAN host with locked drives and rebuild?

Background -

Have a Gemalto keysecure KMS server that was migrated from a single disk to a mirrored array.  Didn't notice that it was not communicating with the cluster any more.  Had a host get wonky, reinstalled esxi, preserved the exiting vmfs data store (I did not backup the host config this round however, stupid me!) with the intent of a quick reconfig and move on with life.  Drives locked due to lack of comms with KMS. 

Have since determined that the KMS wasn't too fond of being copied to the new array, despite booting and operating as what looked to be normal. KMS has now been recovered to it's original state and is confirmed for at least one way trust (vcenter trusts kms right now).  I'm unable to get the kms to trust vcenter again though.  Have another storage array that was able to communicate fine with the kms again and do it's thing without intervention on my part.

The vSAN cluster has 5 hosts running 6.7, all drives are unlocked with exception of the host that was rebuilt and reintroduced into the cluster.   For that host, host encryption mode will not enable - it gives  "general runtime error, cannot generate key" when trying.  I'm unable to do much with it at this point.

I'm not super savvy in this vsan/encryption realm.  My thought is that I were to remove that host and it's locked drives as gracefully as possible from the cluster, that may allow for the two way kms comms restoral with the remaining hosts since no other keys would have changed, at which point the removed host and drives could be reintroduced and encrypted again.

Am I way off base with this my thoughts? Anyone have any suggestions for another path to sort this mess?  At this point, the devs using the vm's are working, so that's good.

I'm a bit terrified of restarting anything on the back end at this point until kms trusts are two way again, probably for good reason.

4 Replies
GreatWhiteTec
VMware Employee
VMware Employee

Issue here is that your hosts does not know about the KMS. Adding the host to the cluster should pass this info back down to the host; however, the vCenter and KMS are not trusting each other.

You can try a workaround and show the host how to reach KMS. You can try editing the KMS settings on that host to match the info from the other hosts on esx.conf (/etc/vmware/esx.conf).

I wrote a blog post a while back. Not to this issue, but you can see where the information lives.

https://greatwhitetec.com/2017/06/06/replacing-vcenter-with-vsan-encryption-enabled/

Hope this helps.

A+, DCSE, MCP, MCSA, MCSE, MCTS, MCITP, MCDBA, NCDA, NCIE-SAN, NCIE-BR, VCP4, VCP5, VCP5-DT, VCAP5-DCA _____________________ If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful.
0 Kudos
Discord
Contributor
Contributor

Awesome, thank you for the response and input.  I'll give it a shot in the morning and see if it helps move things in the right direction.

0 Kudos
Discord
Contributor
Contributor

Got this figured out today. 

Turns out I DID capture the configuration of all the hosts after all, just had forgotten about it with all the stress of this KMS issue.  I ended up restoring the host config, which restored the missing key that the host was looking for, which allowed for host encryption mode to be enabled again on the wonky host.  After a little log reading on the KMS server, I was then able to fix a password issue, which then allowed me to restore the two way trust with the KMS server and begin the long process of decrypting the vSAN, which is almost done now.

Take away - back up ALL of your configs up on a regular basis, no matter what they belong to.  If it has a config, back it up.  Also, if a 3rd party configures any service or appliance, double check their work is inline with what you paid for and make them prove the setup if needed or something critical. Don't ever take their word for it, no matter how nice they are.  What sparked this whole issue was the KMS server not being on a redundant disk array.  This led to the discovery that it wasn't replicating to it's partner, or in a cluster at all actually.  In short, it was a single point of failure and we were misled and/or lied to.

Now on to figuring out how to decrypt a Dell Compellent array, then begin the process of redeploying a keysecure KMS server.  In a cluster.  On a mirror.  And actually replicating it's keys with it's partner this time. :smileyangry:

GreatWhiteTec
VMware Employee
VMware Employee

Glad to hear. To your point. You should always be backing up your keys from KMS as well... very important.

A+, DCSE, MCP, MCSA, MCSE, MCTS, MCITP, MCDBA, NCDA, NCIE-SAN, NCIE-BR, VCP4, VCP5, VCP5-DT, VCAP5-DCA _____________________ If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful.
0 Kudos