VMware Cloud Community
coop2012
Contributor
Contributor

ESX 4.1 - MS clustering disk performance

Hi all,

I have installed 2 virtual machines over 2 ESX 4.1 hosts configured with MS cluster services.

These 2 virtual machines have installed MS windows server 2008 R2 and configured as a cluster for SQL server 2008 R2.

We are using a HP EVA4400 SAN for all storage.

These machines have 2 network adapters (VMXNET3)one for the LAN the other for the cluster heartbeat. Both are configured to use same physical ports.

Our problem at the moment is that we are having throughput issues with all disks that are clustered.

Ive been using the SQLIO.exe tool to test. The command:

Sqlio.exe  -kW -s10 -fsequential -o8 -b8 -LS -Fparam.txt timeout /T 10

If I run a I/O test on a non clustered RDM attached disk I get the following results:

C:\Program Files (x86)\SQLIO>sqlio -kW -s10 -fsequential -o8 -b8 -LS -Fparam.txt

timeout /T 10

sqlio v1.5.SG

using system counter for latency timings, 3579545 counts per second

parameter file used: param.txt

        file l:\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

2 threads writing for 10 secs to file l:\testfile.dat

        using 8KB sequential IOs

        enabling multiple I/Os per thread with 8 outstanding

using specified size: 100 MB for file: l:\testfile.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec: 15642.80

MBs/sec:   122.20

latency metrics:

Min_Latency(ms): 0

Avg_Latency(ms): 0

Max_Latency(ms): 52

histogram:

ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

%: 60 38  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

If I then add the disk to the cluster I then get the result of:

C:\Program Files (x86)\SQLIO>sqlio -kW -s10 -fsequential -o8 -b8 -LS -Fparam.txt

timeout /T 10

sqlio v1.5.SG

using system counter for latency timings, 3579545 counts per second

parameter file used: param.txt

        file l:\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

2 threads writing for 10 secs to file l:\testfile.dat

        using 8KB sequential IOs

        enabling multiple I/Os per thread with 8 outstanding

using specified size: 100 MB for file: l:\testfile.dat

initialization done

CUMULATIVE DATA:

throughput metrics:

IOs/sec:   943.03

MBs/sec:     7.36

latency metrics:

Min_Latency(ms): 0

Avg_Latency(ms): 16

Max_Latency(ms): 1056

histogram:

ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

%: 63 34  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2

If we run this tool on all other servers we get adequate results. It just seems to be when they are clustered.

Anyone shed any light on this?

0 Kudos
10 Replies
vGuy
Expert
Expert

couple of quick thoughts:

--> ensure you're not using RR as the load balancing policy for your shared RDMs.

--> ensure you're using LSI logic SAS as your SCSI adapter..

reference and some additional guidelines: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103795...

0 Kudos
coop2012
Contributor
Contributor

Hi Thanks of ryour help.

When you say "ensure you're not using RR as the load balancing policy for your shared RDMs." where is this configured? on teh SAN or VM?

0 Kudos
vGuy
Expert
Expert

It can be configured from

vCenter --> select host -> configuration -> Storage Adapters -> select HBA -> scroll and right on the RDM LUN -> manage paths -> Path Selection.

(this needs to be done for the LUNs used by MSCS VMs and on all the hosts in the cluster)

there is also a good description of the diff. multipath policies than I could provide in here: http://kb.vmware.com/kb/1011340

0 Kudos
coop2012
Contributor
Contributor

Just had a look. They are all set to Round Robin. What shoudl they be set to? By changing them will this affect anthing else?

0 Kudos
beckham007fifa

RR is not supported, use either MRU or Fixed

Regards, ABFS
0 Kudos
vGuy
Expert
Expert

EVAs are ALUA aware therefore you have the option to use either RR, MRU or FIXED_AP.

since RR is not supported with MSCS VMs, the next preferred policy is MRU.

take note that FIXED in vSphere 4.x is not ALUA aware and is not recommended for EVA arrays.

Update:

regarding impact, you may want to change it only for the LUNs used by MSCS VMs.

there should not be any impact to other VMs.

good luck and let us know how it goes...

0 Kudos
coop2012
Contributor
Contributor

Good news. I changed the policy to MRU and we instantly got a difference with throughout. Looks liek its fixed our issue.

I have one more question. Can i change these policies on the RDM;s that are attached to my live clusterred servers with affecting them?

Thanks for your help.

Matt

0 Kudos
vGuy
Expert
Expert

Glad to hear that!

if you are using cluster accross box, it maybe more safer if you change the RDM path policy first on the host running passive cluster node.

once done you may failover the resources onto passive node and then modify the policy on the remaining host.

0 Kudos
coop2012
Contributor
Contributor

Thanks all working

0 Kudos
vGuy
Expert
Expert

Great!! and thanks for updating the results..

0 Kudos