VMware Cloud Community
BHagenSPI
Enthusiast
Enthusiast
Jump to solution

VSAN and shutting down all hosts in a stack

The question: What is the official recommended way to shut down all physical hosts in a vsphere datacenter, that will keep VSAN happy when the hosts are powered back up?

The scenario:

We are spinning up a new VMWare infrastructure. Vsphere 6.0U3 and VSAN 6.2. All hardware is (finally) on the HCL.

We'll have 2 identical stacks of 6 physical hosts and 2 physical switches. One stack for HQ, on stack for the DR site.

We'd like to build both stacks in the same rack so that after moving all the VMs from our current hosts to the new HQ stack, we can use Veeam and our 10Gb switches to do the initial replication from the HQ stack to the DR stack...rather than doing it over a 100Mb WAN link.

Once replication is complete and we have a few "incrementals" completed, we want to shut down the DR stack and physically move it to the DR site.

I understand we'll need to change IPs...on the phyiscal hosts and switches and on all the virtual networking and vmkernals, but I need to know how to keep VSAN happy during the process.

Thoughts? Best Practices? Links?

Thanks!

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello,

As soon as you put a host in Maintenance Mode with the 'No Action' option, all of the components on its capacity drives will be marked Absent.

This process doesn't move any data as might occur with 'Ensure Accessibility' and provided you have shut down all VMs and all Objects/components are healthy then no data is being written to data Objects. Thus when the cluster wakes up (assuming you have all the networking correct) it will be in the same state as when all were placed in this state.

This would be the same whether you put these in this decom state for 1 minute, 1 hour or 1 week.

True, it is not often that this is done but I have assisted multiple customers with this in the past and the only time that I have seen issues was when the Network at new site was not configured correctly, so I am guessing my colleagues on vSAN-team will advise you the same.

Either way, take the obvious precautions of:

- Making sure all Objects are healthy.

- Backing-up everything before the move (preferably after VMs have been shut-down).

- Make sure the network configuration at the new site is correct.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

View solution in original post

9 Replies
GreatWhiteTec
VMware Employee
VMware Employee
Jump to solution

Here is a link with the recommended procedure for vSAN shutdown/power on.

Shutting down and powering on a vSAN 6.x Cluster when vCenter Server is running on top of vSAN (2142...

BHagenSPI
Enthusiast
Enthusiast
Jump to solution

Thanks; I've seen that before and it makes sense.

But isn't there a "timeout" period for VSAN, so when it doesn't see a host for 60 minutes (or something like that) it removes the host from the cluster and all the vsan data gets erased?

If yes, then what happens when *all* the hosts are down for more than 60 minutes? Will all VSAN data be erased?

(I'm looking for references to this but am not finding them yet...)

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello,

Once a component becomes inaccessible (e.g. when host is put in MM) it is marked as Absent, once it is gone for longer than 60 minutes it is marked Degraded and rebuilt from the remaining copy of the data (assuming it is an FTT=1 Object, an FTT=0 Object will never be marked as such as it is the only copy).

So basically it does not erase anything, it just recreates the data from a more current copy, or does nothing if it is an FTT=0 Object, just waits and hopes it comes back!

In short - no it will not erase any data with the whole cluster shut down.

You can also manually increase or decrease the default timer of 60 minutes before rebuild kicks in:

kb.vmware.com/kb/2075456

Here is a great article by CHogan explaining this further (it is an old article but the concepts have not changed much):

cormachogan.com/2014/12/04/vsan-part-30-difference-between-absent-degraded-components/

One thing not mentioned in the previous comments kb article is that if you have an external PSC, you should also note this and treat it pretty much the same way as the vCenter VM here.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

GreatWhiteTec
VMware Employee
VMware Employee
Jump to solution

OK. You are referring to the rebuild time (clomrepairdelay default 60 minutes). When a host is in MM for >60minutes, then the absent objects are rebuilt on any surviving hosts. When all the hosts are down, there is no check for this, since the cluster is down as a whole. The hosts does not get removed from the cluster automatically in this scenario,

This may help understanding MM behavior

JeffHunter.info: Virtual SAN Availability Part 6 - Maintenance Mode

BHagenSPI
Enthusiast
Enthusiast
Jump to solution

Thanks for the responses; I'm not sure I have an answer yet. All the hosts will be down for probably 2 to 3 weeks as we unrack them and move them to a new site. That's obviously past the 60 minute window. I'm sure VSAN will figure out that everything has been down longer than 60 minutes; I'm worried about what will happen then. Will all disks be marked as degraded? Absent? Not marked? I guess not many people do this; because so far it's all just a guess.

Hopefully somebody who has actually done it will chime in, and then we'll all know for sure! In the mean time, I've opened a support case with vmware in hopes they can tell me. If I hear back with a definitive answer I'll let you know.

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello,

As soon as you put a host in Maintenance Mode with the 'No Action' option, all of the components on its capacity drives will be marked Absent.

This process doesn't move any data as might occur with 'Ensure Accessibility' and provided you have shut down all VMs and all Objects/components are healthy then no data is being written to data Objects. Thus when the cluster wakes up (assuming you have all the networking correct) it will be in the same state as when all were placed in this state.

This would be the same whether you put these in this decom state for 1 minute, 1 hour or 1 week.

True, it is not often that this is done but I have assisted multiple customers with this in the past and the only time that I have seen issues was when the Network at new site was not configured correctly, so I am guessing my colleagues on vSAN-team will advise you the same.

Either way, take the obvious precautions of:

- Making sure all Objects are healthy.

- Backing-up everything before the move (preferably after VMs have been shut-down).

- Make sure the network configuration at the new site is correct.

Bob

-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-

BHagenSPI
Enthusiast
Enthusiast
Jump to solution

Thanks Bob; I've read and re-read the links from you and GreatWhiteTek, and as you mention, maintenance mode / no action sounds like the way to go. Since I have an external PSC, I'll just have to be sure and highlight both it and the vcenter server and shut them down at the same time. 🙂 GreatWhiteTek, thank you for your input as well!

If for some reason tech support comes back to me with a different answer than maintenance mode / no action, I'll post here.

0 Kudos
BHagenSPI
Enthusiast
Enthusiast
Jump to solution

As a follow up, I heard back from VMWare. All they could tell me was:

"There should be a small resync providing that all the hosts are up before the VMs and that when you shut the environment down, it was healthy.  We should expect a small resync but not a full rebuild."

So, once again, this forum was more useful than paid support. 😕 Thanks again Bob and GreatWhiteTec!

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

To be fair, I wouldn't understand the level of vSAN that I do without working providing those support services you speak of (alongside a lot of talented people!). But also consider that maybe you gave more detail here than your SR and/or had more back and forth to be specific about your concerns (and more than one vSANer to help).

If contacting support (VMware or otherwise) and the answers to your questions are not specific enough then just ask (and ask the right things), but really at the admin-level of vSAN the answer you got is correct.

Bob

0 Kudos