Hi there,
does anybody know why there are no ready nodes configurations with more than 12TB of raw capacity?
We've been told that there is no technical reason for that but only from sales perspective these nodes would be too hard to sale...
So we decided to assemble our own VSAN node by using certified components from Dell and ended up with
Dell R730xd
768GB RAM
PERC730
capacity tier: 21 x SAS Disks 1,2TB
caching tier: 3 x SANDISK SX350-3200
Did anyone of you spotted a node like this in a VSAN environment?
Regards,
daniel
I'm using Dell R730xd with 2xP3700 1.6TB SSD, 14xWD Re 4TB SAS drives (2 disk groups per node). Gets ~51TB of raw storage per node.
If you're going with Dell be aware of the following KBs as the H730 controller is problematic.
H730 also not qualified for VSAN 6.2 yet, but hopefully soon.
pfuhli,
Keep in mind that when you order Ready Nodes, you can request additional cache/capacity drives, provided the frame supports it.
I'd suggest revalidating the controller supports the number of drives you wish to run however, as not all controllers are supported with expanders, or above a given number of drives. In some cases you might have to add another controller. Probably better from an availability standpoint as well.
In short, if you see a Ready Node that you'd like to use, and you want to bump up the capacity, you can do that.
Thanks elerium, we're already aware of these KBs.
Have you done any performance testing on your cluster?
We see very good numbers for IOPS and latency while we're only driving a few VMs per host. If we increase the number of load generating VMs per host our cluster becomes instable.
That's why we are looking for peer customers with a similar configuration to get to know their experiences.
Jasemccarty
Ah, ok - I didn't know that. The reason for asking is that we're looking for peer customers with a similar configuration like we have.
During the last weeks of troubleshooting with Dell and VMware BCS it came to my mind that the node configuration we have chosen is too phat in terms of scale up.
Unfortunately we were still not able to determine the root cause of our problems.
So the question is: are customeres out there with VSAN nodes armed the way our nodes are?
I have done performance testing on my clusters. I have Exchange, fileservers, various applications, some SQL servers with medium workloads and hundreds of test/development VMs used by approx 200 users daily. The highest I/O servers are probably Exchange and our analytics/reporting servers. I haven't had any user complaints about performance, I've gotten positive comments on VMs taking 15 seconds to boot and that everything is snappy. On my main production cluster I'm averaging 25 VMs per host, and 80 VMs per host in development. Unless I run some stress tests I don't see high latency or any performance issues.
To stress it hard, I have run IO Analyzer (10 VMs with various workloads) in combination with the built in health proactive test "Stress Test" profile for 8 hours to really stress the clusters before. Latency ends up around 200-600ms average after about an hour (when the SSD cache is exhausted) for VMs that are participating in stressing, these VMs become very slow to operate. For all the other neighboring VMs, the latency goes up from < 10ms average to around 60-80ms, there is noticable lag but everything runs. The cluster is always functional though, very slow but I wouldn't say unstable as nothing crashes or is completely unavailable.
If I'm reading the specs you've posted:
capacity tier: 21 x SAS Disks 1,2TB
caching tier: 3 x SANDISK SX350-3200
Your VSAN is running at ~76% SSD to capacity cache ratio? Do I have this right? My own cache ratios are much lower at around 11%
cache capacity / (capacity * FTT ratio)
3.2TB / ((1.2TB*7) * (0.5)) = ~76%
With that much cache, you're practically running almost all flash, I'd imagine performance would be generally very good. If you have a stress VM with a workload that will exceed the cache, you will see major performance drops once I/O starts reading/writing to the magnetic disks.
I'm also assuming 10gb+ network for your setup?
I'm not quite sure how to do the maths with regards to the FTT ratio.
capacity tier per host: 25.2 TB (raw)
cache tier per host: 9,6TB (raw) [3 x 3.2TB]
That means ~30% cache from our perspective.
10GE NIC dedicated to VSAN.
In hybrid, each SSD is spilt into 70% read-cache, and 30% write cache.