vSAN 7.0 poor write performance and high latency w... - Page 2

MrPowerEdge · ‎11-09-2020

Hi All,

Having some vSAN write performance issues, I would appreciate your thoughts.

The basic spec;

5x vSAN ready nodes, 2x AMD EPYC 7302 16-Core Processor, 2TB RAM, 20x NVMe disks across 4 disk groups. 4x Mellanox 25GbE Networking, Jumbo frames configured E2E.

When running any workloads, including HCIBench we are observing really poor write performance. See below, 30 minutes of 30+ms write latency. Reads are through the roof 400k+ IOPS, writes between 20-40k IOPS depending on parameters. Took 12 hours to consolidate a 10TB snapshot the other day!

Things I have tried:

Disabled vSAN checksum - This made 2k IOPS improvement.
AMD tuning guide : NPS=1 which is the default but suits the workload.
Increased the stripe width from 1 to 2, this improved reads but made write worse.
No de-dupe and compression enabled.
Tried Mirroring and FFT=0 some small improvement but nothing significant.
All patched up from both hardware and software.

Notes:

vSAN insight shows no issues.
Really expected 60K+ write IOPS.

Any ideas please we really expected better.

ConradDL · ‎11-04-2022

Hi Brian

Seem to be facing this exact issue with Splunk on VSAN, I see 512K blocks getting thrown at vSAN and confirmed with our Linux admins that the max_sectors_kb still defaults at 512, which seems to be the issue (will conform next week)

I am curious though, we default pvscsi in our puppet configs to settings per https://kb.vmware.com/s/article/2053145

vmw_pvscsi.cmd_per_lun=254

vmw_pvscsi.ring_pages=32

What made you move away from the large-scale IO pvscsi settings after setting max_sectors_kb to 64K ?

All

vSAN 7.0 poor write performance and high latency with NVMe

high latency.

vSAN

write performance