VMware Cloud Community
infotechDM
Contributor
Contributor
Jump to solution

Poor performance with RDMs only when added to MSCS CAB cluster

Hey guys, thanks for taking a look at this issue ... I am looking for some ideas.

I have 2 virtual machines, running Windows 2008R2.  They have MSCS set up and working as a CAB (cluster across boxes).  I am using a VNX 7500, fibre channel drives and physical RDMs.  This is on an 8 node, ESXI 5.1 v1117900 implementation.  Functionally everything works just fine.

The problem is that the performance is very poor on only the RDMs that have been added to the MSCS cluster.  I can take the same drive, remove it from the cluster, run IOmeter and it is fast.  I add it to the MSCS cluster, leaving it in available storage, and it is 1/15th the performance IOPS.  Delete the drive from MSCS and it goes back to performing as usual.

I've tried different SCSI controllers (Paravirtual vs LSI Logic SAS) and it didnt seem to make a difference.  I have physical MSCS clusters that don't seem to exhibit this kind of performance issue, so I am wondering if there wasn't something goofy with the MSCS over virtual machines configuration.

I have already implemented this KB article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101610...

I was seeing that poor bus rescan performance issue until I set the RDMs as perennially-reserved but it hasn't been an issue since I implemented the fix.

Any help or suggestions is appreciated ...

Thanks,

-dave

Reply
0 Kudos
1 Solution

Accepted Solutions
Scott_G1
Enthusiast
Enthusiast
Jump to solution

I recently upgraded my esxi environment from 5.0 to 5.1 and have a bunch of systems with RDM's. I had a problem as well. What I found was that during the upgrade it changed all the path policies for the luns and rdm's to round robin. This caused a huge performance issue in my environment. I changed all paths to MRU and it solved the issue.

View solution in original post

Reply
0 Kudos
2 Replies
Scott_G1
Enthusiast
Enthusiast
Jump to solution

I recently upgraded my esxi environment from 5.0 to 5.1 and have a bunch of systems with RDM's. I had a problem as well. What I found was that during the upgrade it changed all the path policies for the luns and rdm's to round robin. This caused a huge performance issue in my environment. I changed all paths to MRU and it solved the issue.

Reply
0 Kudos
infotechDM
Contributor
Contributor
Jump to solution

Scott,

I have to admit, I was ready to call shenanigans on this idea since everything I know says that Round Robin should be the best performance.  And my initial tests proved that out ... at first.  Using an "expected" IOMeter workload 1GB workspace, 4 outstanding I/O, 8K block, 100% sequential, 33%R/67%W I ran some 10 min tests with a 30 second ramp.  The only thing I changed was if the drive was in the cluster or not, and the path policy. 

At first, the numbers came back as I expected.  I ran a test with Round Robin (RR), then Fixed, then Most Recently Used (MRU), and I did that 5x over with a test LUN outside of the cluster.  Here are the averages:

RR      ~8500 IOPS, 66 MB/s, 2.0ms RT

Fixed   ~3900 IOPS, 31 MB/s, 3.8ms RT

MRU    ~4400 IOPS, 35 MB/s, 3.5ms RT

But then I did the same test again, with the LUN in the cluster.  Everything the same.

RR      ~1000 IOPS, 7.8 MB/s, 15.9ms RT

Fixed   ~3800 IOPS, 30 MB/s, 4.2ms RT

MRU    ~7200 IOPS, 58 MB/s, 2.1ms RT

RR was 1/8th as fast, and MRU was 2x faster just adding the LUN to the cluster ... /boggle.  I had to do the whole thing again, just to check (only 3x instead of 5, and a 5 min run instead of a 10 min run) and the numbers were the same.  So while I don't think I will be changing any of my non-MSCS clustered RDM LUNs to MRU anytime soon, I have changed my MSCS clustered RDMs to MRU already and the performance issues are gone.  They aren't as fast as they would be non-clustered, but that's fine, they are close enough.

Thanks so much!  Great suggestion.

-dave

Reply
0 Kudos