Solved: Re: Unable to deploy 8 node stretch cluster in hom...

tarun2602 · ‎03-23-2024

I have a setup of 4 esxi hosts nested in 2 different physical servers, however when I create a 6 node stretched cluster with 1 witness everything works fine, as soon as I make it a 8 node cluster, it VSAN partitions and the new hosts show a different partition. Please suggest how can we make this work.

All hosts have 1 vmk0 for all management and VSAN traffic.

Shen88 · ‎03-23-2024

@tarun2602,

I guess each Nested ESXi host needs 16GB RAM for this to work, please refer this old post discussed in detail having the same situation as yours!

https://communities.vmware.com/t5/VMware-vSAN-Discussions/VSAN-6-6-Stretched-in-Nested-Environment-N...

If you think your queries have been answered, Mark this response as "Correct" or "Helpful" and consider giving kudos to appreciate!

Regards,
Shen

View solution in original post

Shen88 · ‎03-23-2024

@tarun2602,

I guess each Nested ESXi host needs 16GB RAM for this to work, please refer this old post discussed in detail having the same situation as yours!

https://communities.vmware.com/t5/VMware-vSAN-Discussions/VSAN-6-6-Stretched-in-Nested-Environment-N...

If you think your queries have been answered, Mark this response as "Correct" or "Helpful" and consider giving kudos to appreciate!

Regards,
Shen

TheBobkin · ‎03-25-2024

@Shen88 - A nested vSAN node not having enough RAM wouldn't be a likely cause for it to not join/become isolated from the cluster, and besides, you have no information as to how much RAM these have provisioned.

@tarun2602 - What have you checked and validated here?
Some basics to start:
Are all nodes on the same ESXi build/version?
Do all nodes have a complete unicastagent list of all other cluster members?
MTU is consistent on all vsan-vmk (and the backing vmnics and vswitch if using 9000 MTU)?
You are 100% sure you tagged vsan (and not witness traffic) on vmk0 of all nodes?
If all of that is fine, validate vmkping between the vmk in question (and with MTU size -18 bytes and -d flag).
If that is also fine then check the flow of UDP 12321 traffic from cluster Leader to the newly added nodes, perhaps it is getting dropped/blocked between the nodes.

tarun2602 · ‎03-25-2024

@Shen88 The issue was resolved as soon as we increased the memory to 16GB on the hosts.

@TheBobkin All the communication was tested and working, however as per the article shared by Shen, when the nested infra has a number of 9 (hosts + witness) it creates a partition.

Shen88 · ‎03-26-2024

@tarun2602,

Glad to know this helped, thanks for the update.

If you think your queries have been answered, Mark this response as "Correct" or "Helpful" and consider giving kudos to appreciate!

Regards,
Shen

TheBobkin · ‎03-26-2024

@Shen88 , My doubt in previous comment was based on only experiencing limitations/issues on low-memory nested vSAN nodes with Disk-Groups failing to mount/unmounting due to LSOM OOMing, fair play to you finding that old post (from 7 years ago and which I was on 😀).

My educated guess is that RDT or CMMDS bumps required memory past certain node count as it would need to talk to more nodes, this is not something one would ever see in production but good to know about anyway, so thank you.

All

Unable to deploy 8 node stretch cluster in home lab