VMware Cloud Community
MichaelGi
Enthusiast
Enthusiast

VSAN Performance and Throughput with sql backups

We have a 4 node vsan cluster with HP Proliant Gen 9 servers. P440 controller with dual 10gb nics.  There are 3 hybrid disk groups in each server.  We have 1 200GB enterprise ssd in each group with 3-4 900gb spinning disks.  We are migrating from a raid 5 local storage server in production.  During testing, we are seeing a drop in throughput with the vsan where sql backups take twice as long and have a throughput of around 100MB/s.  With the local storage we are seeing around 200MB/s.  The backup is going from one partition on the vmdk disk to another one.  I'm looking for suggestions to possibly get better performance out of the vsan.  I've tried different stripe widths, updating firmware etc.  Any help would be greatly appreciated.

Reply
0 Kudos
10 Replies
Linjo
Leadership
Leadership

What version of vSAN? Did you try to disable software checksums?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
MichaelGi
Enthusiast
Enthusiast

Vsan 6.2.  Yes checksum is disabled.

Reply
0 Kudos
zdickinson
Expert
Expert

Good morning, not sure if there is going to be a "fix" for this.  Backups are all reads (obviously) and not likely to be in the cache, so you're relying on the read speed of the HDDs in a hybrid config.  My suggesting was going to be increase the stripe width so there are more drives to read from, but it sounds like you have already done this.  I think is just how it is.  Thank you, Zach.

Reply
0 Kudos
MichaelGi
Enthusiast
Enthusiast

Thank you for your response.  The databases backup to the local disk and the performance is inconsistent.  It seems when I first create the vmdk the performance is good then it slows down over time.

Reply
0 Kudos
zdickinson
Expert
Expert

Sounds like you're either filling up the cache or you begin missing it or both.  Can you take a look at those metrics and see if there is a correlation?  Thank you, Zach.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot

You may want to use a disk benchmarking tool such as crytsaldiskmark or iometer and check the throughput of sequential read and writes (probably use 1gb or 4gb working set to simulate a backup). That way you can determine if it's read or writes that are  bottlenecking (hard to tell since you're writing from one vmdk to another) and then go from there based on what you find. Write bottleneck would be on the SSD cache usually, read bottleneck on the raid controller/magnetic capacity drives.

Reply
0 Kudos
MichaelGi
Enthusiast
Enthusiast

CrystalDiskMark shows 1200MB/S reads and 600MB/S writes.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot

Hrm, those seem like pretty ballpark numbers for hybrid deployment of your size and suggest raw disk read/write throughput isn't a problem. Do the backups employ compression? If so, maybe look into the CPU utilization during backup.

The backups are all local within the VM yes? (no network IO used from source to target?) Just trying to rule out other possibilities.

Reply
0 Kudos
depping
Leadership
Leadership

Good morning, not sure if there is going to be a "fix" for this.  Backups are all reads (obviously) and not likely to be in the cache, so you're relying on the read speed of the HDDs in a hybrid config.  My suggesting was going to be increase the stripe width so there are more drives to read from, but it sounds like you have already done this.  I think is just how it is.  Thank you, Zach.

increasing the stripe width is the key here indeed. you are hitting a single spindle most likely right now per VM, which limits the amount of IOPS unfortunately. Most customers who need a better sequential read performance increase the stripe width to 3 - 5

Reply
0 Kudos
MichaelGi
Enthusiast
Enthusiast

I appreciate everyone's responses.  The backups write to local disk so it's a write operation as well.  It seems when I change the stripe width I get good performance, then it gets worse over time.  I'm working with support on this as well.

Reply
0 Kudos