VMware Cloud Community
zdickinson
Expert
Expert

Stripe Width And Performance

Hello all, we havd e a 3 node vSAN setup for DR.  I was testing performance with IOMeter and it looked great with default stripe width of 1.  I was getting around 350 MB/s read and write.  CPU was pegged on the VM I was testing with, so I think that was the bottleneck.  I then applied a policy that had stripe width 7, waited for things balance, and then re-ran the same test.  Now I was getting around 25 MB/s.  That was unexpected as I thought more drives = more IOPs.  Now I have re-applied the stripe width 1 policy, but of course that is just a minimum so the objects are still spread out all over the place.  I believe in vSAN 6 there is an easy option in the GUI to re-balance.  Is there something similar through CLI in vSAN 5.5?  Can anyone comment on the performance difference?  Setup details below.  Thank you.

3 nodes of:

Dell R820

Quad Intel 2.2 procs

256 GB RAM

Intel 10 GB NICs

Dell 4032 for vSAN traffic

Broadcom 1 GB for management and VM traffic

2 P420M MIcron SSDs

14 Seagate 1.2 HDDs

LSI 9207-81

All on HCL, vSAN ready node.  Firmware and drivers match HCL as close as I could get.

Reply
0 Kudos
13 Replies
TomHowarth
Leadership
Leadership

To understand why your performance went down have a read of this blog post by Cormac Hogan, VSAN Part 7 - Capabilities and VM Storage Policies | CormacHogan.com

also here is a quick link to a Cisco site that shows the ESXCLI commands for the VSAN Chapter 6 - Virtual SAN Command Line Commands - Cisco

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos
zdickinson
Expert
Expert

Thank you for the links.  The Cormac article seems to say that more stripes should help when destaging from SSD to HDD on writes and when missing cache on reads.  However the bit about stripes existing in the same disk group and thus using the same SSD could be the issue.

I checked the Cisco article and didn't see anything relating to forcing the stripe width back to 1.  However many of the commands were unknown, I'll did into the Essential vSAN Guide.

Thank you, Zach.

Reply
0 Kudos
CHogan
VMware Employee
VMware Employee

When you changed the stripe width back to 1, did VSAN begin rebuilding a new set of replicas for your VM?

You can check this with the RVC command vsan.resync_dashboard.

If this is indeed the case, then you have a bunch of rebuild traffic going on at the same time as the VM I/O tests that you are running.

This may lead to less performance depending on how much rebuilding is taking place.

There are some more details here, fyi: VSAN Part 35 – Considerations when dynamically changing policy | CormacHogan.com

http://cormachogan.com
Reply
0 Kudos
zdickinson
Expert
Expert

Cormac, thank you for the reply.  Always good to see you on the forums.  This was my process:

1.)Run IOMeter, 200+ MB/s with 50% read, 50% random.

2.)Change stripe width to 7, monitor process using vsan.resync_dashboard, when finished...

3.)Run IOMeter, 50+ MB/s with 50% read, 50% random.

I now have two servers running IOMeter tests for 18 hours.  Only difference is one has stripe width 1 and the other has stripe width 7.  Stripe width 1 = ~230 MB/s.  Stripe width 7 = ~50 MB/s.  I would have expected the opposite.  Thank you, Zach.

stripe_width_1.pngstripe_width_7.png

Reply
0 Kudos
CHogan
VMware Employee
VMware Employee

Are all 14 disks behind the one LSI 9207-81 controller? Or are you using something like a SAS expander to address that many disks?

I'm wondering if this is the reason why things are behaving so differently?

http://cormachogan.com
Reply
0 Kudos
zdickinson
Expert
Expert

Interesting question.  Yes, all 14 disks are behind the one controller.  As I say that, I can see that each disk group of 7 should have had its own controller.  Is that where you were going?  Thank you, Zach.

Reply
0 Kudos
CHogan
VMware Employee
VMware Employee

I don't know if it is an issue or not.

Some controllers can only support 8 devices, others can support 16. I was just questioning whether this controller could support that many disks.

When controllers cannot support that many disks, components like SAS expanders are sometimes used to add more disks to a controller, and these can be problematic and perform poorly.

It was just a line of thought - I'm not sure if it is relevant to your configuration however.

Cormac

http://cormachogan.com
zdickinson
Expert
Expert

Thank you Cormac, that was very helpful.  If I end up getting additional cards and am able to do more testing, I will update the thread.  For now I consider it closed.  Have a great day!

Reply
0 Kudos
depping
Leadership
Leadership

wondering what is slowing you down, the reads or the writes... I agree that you would expect reads to be faster... and writes shouldn't be a problem either. Strange.

Reply
0 Kudos
crosdorff
Enthusiast
Enthusiast

The server with the 7 stripes is doing twice the number of iops!

Do you use the same access pattern for both tests?

Reply
0 Kudos
depping
Leadership
Leadership

crosdorff wrote:

The server with the 7 stripes is doing twice the number of iops!

Do you use the same access pattern for both tests?

Good point, you also have lower latency...

Reply
0 Kudos
zdickinson
Expert
Expert

You're right, I posted the wrong screen shots.  The 7 stripe screenshot is from a 4k test, while the 1 stripe screenshot was from a 32k test.  Explains the higher IOPs for the 7 stripe and the higher throughput for the 1 stripe.  I thought I had my ducks in a row, but did not.  Sorry for the confusion.  I will do more testing and will post it if it seems useful.  Thank you, Zach.

Reply
0 Kudos
zdickinson
Expert
Expert

I hope all is well.  I was able to to do some more testing.  It seems that stripe width 4 is the optimal setting for our config.  See details below.  I looked into a second card to spread the IO, but our backplane did not allow for it.  Thank  you, Zach.

IOMeter test:

  Workers = 2

  Test File = 2 GB

  Outstanding IOs = 16

  Size = 32 KB

  Read = 50%

  Random = 50%

  Length = 72 hours

With stripe width 1 I saw around 180 MB/s and 5600 IOPs.  The performance increased as stripe width increased, peaking at width 4 with around 230 MB/s and 7200 IOPs.  Then performance decreased as stripe width increased, having a valley at width 7 with around 27 MB/s and 830 IOPs.

Stripe 1:

stripe_1.png

Stripe 4:

stripe_4.png

Stripe 7:

stripe_7.png

Reply
0 Kudos