Help me understand this VMware storage concept

BradI100 · ‎10-27-2011

Hi all,

I have been struggling to understand a difference in storage speed that occurs in my VMware environment.

To give an example, yesterday I shutdown a vm and did a "cold" storage migration to different iscsi storage. The vm moved quickly (for me) at an average write speed of 60MB/s to the new unit. But once I booted the same VM on the same host, and storage, I found the speeds disappointing. Using IOMETER on that VM, I could only get a max write speed of 5MB/s.

I've seen this play out in the same way with different storage units time and time again. So why the big difference? Does it suggest something is configured wrong?

JohnADCO · ‎10-27-2011

Write speed of 5MB on what IO meter test parameters exactly?

Does the VM seem not peppy in general?

Do file copies seem OK from within the VM?

BradI100 · ‎10-27-2011

for iometer settings I typically start with a write of 4k (0% read, 0% random) and 4 targets. Then I double the targets for each successive test until I hit - to 32 or so. (Does that sound like the right type of test?)

The copies within the vm are normally so-so, but sometimes the disk latency gets very bad and freezes up the vm for 2-3 seconds.

kjb007 · ‎10-27-2011

How much data was actually moved? How full was the vm, and how large was it?

Have you watched esxtop during a file copy or a io meter run? Seeing how the latencies compare between the guest, kernel, and device?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

BradI100 · ‎10-27-2011

The VM is 30GB in size with 16GB in use.

I did not run esxtop but that's a good idea. I will do that and post results.

BradI100 · ‎10-27-2011

Here are my stats from esxtop.

for the VM: VMNAME VDEVNAME NVDISK CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s LAT/rd LAT/wr

..........................Sec2 - 1 1487.30 0.00 1487.30 0.00 5.81 0.00 2.58

Disk adapter

ADAPTR PATH NPTH CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd

vmhba33 - 14 1269.02 0.00 1269.02 0.00 4.95 2.90 0.00 2.91 0.00

Disk Device

DEVICE PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD LOAD CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cm
naa.6001a... - 128 - 4 0 3 0.03 1028.40 0.20 1028.21 0.00 4.01 3.71 0.0 0.0

These seem to be pretty standard overall, but I did notice sometimes the read latency would hover at 300+ and at one time it hit 3100

Do any of these send off alarm bells?

kjb007 · ‎10-28-2011

The numbers look pretty good. I tend to ignore spikes that are way out in left field, unless they occur regularly and/or for prolonged intervals.

Were you doing a copy at this time, or running IOmeter? Looks like all writes.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

BradI100 · ‎10-28-2011

Thanks. This was an iometer write test. But I still don't understand why a cold storage vmotion can write at ~70MB/s but as you see in my write test above I only get ~6MB/s. Why is there a speed difference so extreme?

mcowger · ‎10-28-2011

How many outstanding IOs?

--Matt VCDX #52 blog.cowger.us

BradI100 · ‎10-28-2011

This was 32 outstanding IOs

BradI100 · ‎10-28-2011

Quick follow up. (forgive me for being sort of clueless to this testing). But when I set the number of IOs to 20 and increased the workers to 10, I started writing at up to 30MB/s! Wow, we're getting someplace. Ok, that leaves me with another question. Does ESXi/vSphere cap storage bandwidth on a per worker basis?

kjb007 · ‎10-28-2011

Not exactly. You are still limited by queues on every level of the IO path. The guest maintains disk queues, the host maintains LUN queueus, and depending on your array, the storage maintains queues of its own.

You're getting throughput by limiting how many IOs you will leave open, but then are allowing multiple IO producers that are doing the same thing.

While one worker is waiting for a particular IO, another is able to perform more IO before it too has to wait.

If you're able to properly take a job and break it up into pieces, then the job will definitely run faster.

That's why IOmeter is a good tool, because you can test all of these things out, but can get mighty tricky to interpret depending on how you set everything up.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

BradI100 · ‎10-28-2011

This is very helpful - thanks!

What I think is a challenge with this storage unit is it SEEMS to view each server app as a single worker thread - no matter how many clients are connected. So as an example, I have a sharepoint type of app that 40 clients use all day long. Still, the reads and writes never go above ~6MB/s.

So when one client is working on a big PDF they snag all of the storage bandwidth - thereby timing out other users.

I do not have this issue with my home built Starwind server.

Do you know of any way to open up this pipe so this storage allocates better on a per user basis?

kjb007 · ‎10-28-2011

What type of device is it, and how many storage units/drives/LUNs are you presenting as datastores?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

BradI100 · ‎10-31-2011

It is a Drobo Elite. I am presenting 3 datastores to it on 2 different LUNS. (one server has 2 hard drives). The other is a Windows 7 VM that keeps time and settings for a security system.

kjb007 · ‎10-31-2011

Do you have RAID configured, and if so, how is it configured? How many drives are part of the raid set?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

All

Help me understand this VMware storage concept