VMware Cloud Community
MrPowerEdge
Contributor
Contributor

vSAN 7.0 poor write performance and high latency with NVMe

Hi All,

Having some vSAN write performance issues, I would appreciate your thoughts.

The basic spec;

5x vSAN ready nodes, 2x AMD EPYC 7302 16-Core Processor, 2TB RAM, 20x NVMe disks across 4 disk groups. 4x Mellanox 25GbE Networking, Jumbo frames configured E2E.

When running any workloads, including HCIBench we are observing really poor write performance. See below, 30 minutes of 30+ms write latency. Reads are through the roof 400k+ IOPS, writes between 20-40k IOPS depending on parameters. Took 12 hours to consolidate a 10TB snapshot the other day!

 
 
 

Screenshot 2020-11-09 194510.jpg

Things I have tried:

  • Disabled vSAN checksum - This made 2k IOPS improvement.
  • AMD tuning guide : NPS=1 which is the default but suits the workload.
  • Increased the stripe width from 1 to 2, this improved reads but made write worse.
  • No de-dupe and compression enabled.
  • Tried Mirroring and FFT=0 some small improvement but nothing significant.
  • All patched up from both hardware and software.

Notes:

  • vSAN insight shows no issues.
  • Really expected 60K+ write IOPS.

Any ideas please we really expected better.

Labels (3)
0 Kudos
20 Replies
ConradDL
Contributor
Contributor

Hi Brian

Seem to be facing this exact issue with Splunk on VSAN, I see 512K blocks getting thrown at vSAN and confirmed with our Linux admins that the max_sectors_kb still defaults at 512, which seems to be the issue (will conform next week)

I am curious though, we default pvscsi in our puppet configs to settings per https://kb.vmware.com/s/article/2053145

vmw_pvscsi.cmd_per_lun=254

vmw_pvscsi.ring_pages=32

 

What made you move away from the large-scale IO pvscsi settings after setting max_sectors_kb to 64K ?

0 Kudos