VMware Cloud Community
Sharantyr3
Enthusiast
Enthusiast

How to add new physical disks - best practice ?

Hi there,

---- Off topic, but explains my question ---

I'm currently testing vSAN on 4 nodes (full flash, 2 diskgroups each with 3 data disks) on dell r740xd nodes.

We received 8 more capacity disks (2 per ESXi) so I hotplugged the new disks on servers.

At this time, one ESXi was offline (powered off) for replacment of failed internal drive (not claimed by vSAN, in use by OS).

While 3 out of 4 nodes was up, everything was ok, new drives recognized by vSAN (but not auto claimed).

I powered on the fourth node, waited it to be up and running, and bam, vSAN went completly nuts. vSAN "Configure" -> "Physical drives" tab was unresponsive on both html5 and flash client.

But, the VM running on vSAN was still there and running. But I was completly unable to access and change anything in vSAN configuration.

In command line, on the 3 nodes that were powered on during disk add, I could run "esxcli vsan cluster get", but on the fourth node, this command was never ending.

I tried restart vSAN services on all nodes, restart vCenter, all still stuck, even the html 5 client would never load anymore (but could access flash).

I ended powering off hard (graceful shutdown stuck) the problematic ESXi, and after some time I could get back my hands on vSAN and vCenter.

--- My question ---

So after all this, and considering we are going to deploy a stretched cluster configuration, what is the best recommended way to add new capacity / cache drives to a vSAN cluster ?

I couldn't find any information except : VMware Knowledge Base  that is best practices for upgrade a vSAN cluster.

Should I add new drives only when all hosts are here ? Or no hot add at all ? Put host one by one in maintenance mode, with data evacuation, then add new drive ?

And what about stretched cluster considerations ?

Thanks for your help

Reply
0 Kudos
2 Replies
TheBobkin
Champion
Champion

Hello Sharantyr3

Are you positive you replaced the faulty drive and how did you re-install ESXi? (assuming that's what you meant by "in use by OS")

From what you describe it is likely that hostd was kaput likely as a knock-on effect from something else failing - in this case always use localcli instead of esxcli and if this is not returning output then either use dmesg or Alt+F12 at the DCUI to see the vmkernel logging and figure out what is breaking.

"So after all this, and considering we are going to deploy a stretched cluster configuration, what is the best recommended way to add new capacity / cache drives to a vSAN cluster ?"

All of this information can be found on docs.vmware in the sub-sections here:

Device Management in a vSAN Cluster

e.g. Add a Capacity Device

"Should I add new drives only when all hosts are here ?"

No, this shouldn't matter.

"Or no hot add at all ?"

Hot-add capabilities are dependent on the controller in use supporting this feature (and usually the firmware in use).

"Put host one by one in maintenance mode, with data evacuation, then add new drive ?"

If hot-add is supported as per the above then this isn't necessary, if you do want/need to do MM/power-off then you could expedite this process by checking that your back-ups are good then using MM with 'Ensure Data Accessibility' option (increase clom repair delay timer if you think adding disks and reboot will take longer than 60 minutes).

"And what about stretched cluster considerations ?"

No special considerations other than ensure you are adding the storage evenly to each site (preferably homogenous per node).

Bob

Reply
0 Kudos
Sharantyr3
Enthusiast
Enthusiast

Hi,

The failed disk is out of the problem, it's an internal RAID 1 (on different controller) that did not disturb the server at all (failed few days before my vsan issues).

Thanks for the two links you provided, but I don't see best recommended way to do it.

So in "conclusion", if my hba card accept hot add, I should not have any issues hot adding disks to production running ESXis, there is no need to put host in maintenance mode or any other kind of precaution procedure ?

About stretched cluster, should I claim new disks once all disks have been added in all hosts in both sites or I can expand site 1 and expand site 2 later ?

Reply
0 Kudos