Can you please go to:
Cluster - Monitor - vSAN - Health
Do you have any RED or Yellow statements? If so can you please send me all of them?
Then, what is the connection you are using between your 3 hosts? You write 10GB, but howmuch NICs are you using for the vSAN Kernel?
To be clear, you can use 1 Kernel for vMotion, SAN and Management and have all 3 Services for the same Kernel or then separate them in to different Kernels.
In our environment we use it this way:
We have a total of 8 Uplinks on our vDS Switch.
I use 2 vKernels with vSAN service activated at 2x 10GB per Host.
This gives you when you do this (putty on a host of the 3) esxcli vsan cluster unicastagent list 2 IPs per Host for the vSAN communication.
Our connection was also very poor, but because I used all 3 services on only 1 Kernel and that didn't help!
Try to do that and make sure you use 10GB all the way, that you have at least 4 Uplinks for a fair speed, so that you can separate VM Network, vMotion and vSAN.
And send me those Red and Yellow states.
1 person found this helpful
While it is very nice to have a decent home set-up such as that, are you comparing like for like hardware here?:
3 to 4 node is a big enough difference to start.
Is the Lenovo cluster hooked to nearer $10k worth of switch and/or two switches and over how many links?
LSI SAS 9207-8i has a *relatively* low queue depth compared to most newer controllers and not supported for vSAN 6.5 so can't say whether driver/firmware holds up performance-wise (+ never checked if it was deemed unsupportable on 6.5 for performance/compatibility reasons or was EOL'd by LSI). - Going to assume you have this correctly in pass-through mode for the disks.
Try comparing performance tests with different IO profiles - potentially your home-lab is only far further behind in certain areas.
Make sure all drivers and firmware are up to scratch.
If you want to get deeper into comparing the performance of any cluster more granularly I would advise using vSAN Observer:
Or 3rd party alternative: (don't mind the URL, link is safe for work )
Hope this helps
I wouldn't advise multiple vmknics per vSAN as first choice, there are caveats to configuring it this way - also redundancy/availability can be configured better lower in the stack (NIC):
"Virtual SAN does not support multiple VMkernel adapters on the same subnet. You can use multiple VMkernel adapters on different subnets, such as another VLAN or separate physical fabric. Providing availability by using several VMkernel adapters has configuration costs including vSphere and the network infrastructure. Network availability by teaming physical network adapters is easier to achieve with less setup."
I have Sexigraf running on the cluster at home. The servers at work do have a dedicated 10Gbe link for VSAN, while my house does not, but I am not running those links anywhere near capacity and VSAN is free it use them all when they need to.
The switch at work is a Brocade Fabric VDX, while mine at my house it a Brocade ICX. Little different but I defiantly dont have a D-Link at home. The ICX is a enterprise grade switch that can be used for distribution or access layer. It has lots of capacity.
What Firmware version are your LSI 9207-8i cards on? By the look of that screenshot, that is not FW19 but something newer. I had major issues with FW20 and went back to FW19. Now all is good again. I had shitty performance and disks dropping out at random, especially at heavy load.
I don't know which donkey put FW20 on the HCL, but it should be removed.
About switches: I don't know your Brocade but in general, switches with small "per-port buffers" will give issues when going full-monty using NFS, iSCSI, FCOE or vSAN etc.
Storage over IP is very demanding, especially under prolonged load.
I am running FW 20. I will go back to FW 19 and see is that helps. I am still concerned that my cache S3700s are showing up as 3Gbps drives...
"S3700s are showing up as 3Gbps drives"
I had HGST 600GB 10k 6G SAS drives do that. With FW19, they are on 6 again.
OK I just downloaded FW19. Ill flash all three of my cards today. I am just running the inbox driver, that OK?
yes that's ok. I use the standard 6.5 U1 driver too.
Went to FW 19 on all the LSI 9207 controllers. Performance was worse. I am now back on FW 20.00.07.00 on the 9207s
Take a look at my NAA latency. Dont these seem really high for Intel DC SSD Drives?
For anyone else seeing this problem, it was the lsiprovider vib I had installed to manage the LSI controllers. I found another thread talking about that, I removed it and the latencies are back under 1ms.
FYI: VMware has removed FW20 from the HCL and is back at FW19. I guess I was not the only one with problems.