VMware Cloud Community
pfuhli
Enthusiast
Enthusiast

VSAN Hybrid Node Design Question

Hi there,

does anybody know why there are no ready nodes configurations with more than 12TB of raw capacity?

We've been told that there is no technical reason for that but only from  sales perspective these nodes would be too hard to sale...

So we decided to assemble our own VSAN node by using certified components from Dell and ended up with

Dell R730xd

768GB RAM

PERC730

capacity tier: 21 x SAS Disks 1,2TB

caching tier: 3 x SANDISK SX350-3200

Did anyone of you spotted a node like this in a VSAN environment?

Regards,

daniel

8 Replies
elerium
Hot Shot
Hot Shot

I'm using Dell R730xd with 2xP3700 1.6TB SSD, 14xWD Re 4TB SAS drives (2 disk groups per node). Gets ~51TB of raw storage per node.

If you're going with Dell be aware of the following KBs as the H730 controller is problematic.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21363...

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21354...

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=21096...

H730 also not qualified for VSAN 6.2 yet, but hopefully soon.

Jasemccarty
Immortal
Immortal

pfuhli‌,

Keep in mind that when you order Ready Nodes, you can request additional cache/capacity drives, provided the frame supports it.

I'd suggest revalidating the controller supports the number of drives you wish to run however, as not all controllers are supported with expanders, or above a given number of drives. In some cases you might have to add another controller. Probably better from an availability standpoint as well.

In short, if you see a Ready Node that you'd like to use, and you want to bump up the capacity, you can do that.

Jase McCarty - @jasemccarty
pfuhli
Enthusiast
Enthusiast

Thanks elerium, we're already aware of these KBs.

Have you done any performance testing on your cluster?

We see very good numbers for IOPS and latency while we're only driving a few VMs per host. If we increase the number of load generating VMs per host our cluster becomes instable.

That's why we are looking for peer customers with a similar configuration to get to know their experiences.

0 Kudos
pfuhli
Enthusiast
Enthusiast

Jasemccarty

Ah, ok - I didn't know that. The reason for asking is that we're looking for peer customers with a similar configuration like we have.

During the last weeks of troubleshooting with Dell and VMware BCS it came to my mind that the node configuration we have chosen is too phat in terms of scale up.

Unfortunately we were still not able to determine the root cause of our problems.

So the question is: are customeres out there with VSAN nodes armed the way our nodes are?

0 Kudos
elerium
Hot Shot
Hot Shot

I have done performance testing on my clusters. I have Exchange, fileservers, various applications, some SQL servers with medium workloads and hundreds of test/development VMs used by approx 200 users daily. The highest I/O servers are probably Exchange and our analytics/reporting servers. I haven't had any user complaints about performance, I've gotten positive comments on VMs taking 15 seconds to boot and that everything is snappy. On my main production cluster I'm averaging 25 VMs per host, and 80 VMs per host in development. Unless I run some stress tests I don't see high latency or any performance issues.

To stress it hard, I have run IO Analyzer (10 VMs with various workloads) in combination with the built in health proactive test "Stress Test" profile for 8 hours to really stress the clusters before. Latency ends up around 200-600ms average after about an hour (when the SSD cache is exhausted) for VMs that are participating in stressing, these VMs become very slow to operate. For all the other neighboring VMs, the latency goes up from < 10ms average to around 60-80ms, there is noticable lag but everything runs. The cluster is always functional though, very slow but I wouldn't say unstable as nothing crashes or is completely unavailable.

0 Kudos
elerium
Hot Shot
Hot Shot

If I'm reading the specs you've posted:

capacity tier: 21 x SAS Disks 1,2TB

caching tier: 3 x SANDISK SX350-3200

Your VSAN is running at ~76% SSD to capacity cache ratio? Do I have this right? My own cache ratios are much lower at around 11%

cache capacity / (capacity * FTT ratio)

3.2TB / ((1.2TB*7) * (0.5)) = ~76%

With that much cache, you're practically running almost all flash, I'd imagine performance would be generally very good. If you have a stress VM with a workload that will exceed the cache, you will see major performance drops once I/O starts reading/writing to the magnetic disks.

I'm also assuming 10gb+ network for your setup?

0 Kudos
pfuhli
Enthusiast
Enthusiast

I'm not quite sure how to do the maths with regards to the FTT ratio.

capacity tier per host: 25.2 TB (raw)

cache tier per host: 9,6TB (raw) [3 x 3.2TB]

That means ~30% cache from our perspective.

10GE NIC dedicated to VSAN.

0 Kudos
srodenburg
Expert
Expert

In hybrid, each SSD is spilt into 70% read-cache, and 30% write cache.

0 Kudos