Is CA turned on? You are using synchronous or asynchronous replication? I'm not surprised. For us atleast the HP EVA8100 could not cut it as Enterprise Class storage.
No and yes.
We do use CA, but not on the LUNs which are part of the problem.
The FC and FATA LUNs which are part of the problem are simple single LUNs, size varies from 250GB to 1TB.
There are some LUNs which use CA in synchronous mode and there are a lot of LUNs used for other servers than ESX.
Just wondering if you could provide some more information. How many drives are in your FC group and FATA group? Also what RAID levels are on the LUNs you are copying from and to?
Thanks and kind regards.
It's 2 DG in total, 1 FATA, 1 FC disk group.
Nearly all LUNs are configured as Raid5, just a few are Raid1, but these are not part of our problem.
The FC group is built on a total of 128 disks (300GB model)
The FATA group is built of 40 disks (1TB each).
This configuration exists 2 times, the problem exists on both EVAs.
We are copying from several 250-500GB Raid5 FC LUNs (datastores) to several 500GB FATA LUNs (datastores).
In addition to our ESX LUNs, there are a lot of FC/FATA LUNs, which are used by different systems like SAP databases, physical Windows servers, Unix, etc.
Thanks for your help.
1 person found this helpful
Okay the first thing that is obvious is that not only are you copying from faster to slower disk but you are copying to a DG that has a much lower spindle count. Lets assume that your 300GB disks are 10K the theorical IOPs, assuming it is all configured as RAID5 and you have a 60% read cycle, is ~6,800. Now if we look at your FATA DG with the same assumption the IOPs is ~1,400. As you can see this quickly explains why it is 5X slower to copy from the FC to the FATA DG.
Now while this does explain the across DGs issue it does not explain why within a DG there is not this large differential. One thing you may be experiencing is that the EVA 8x00 don't have very large write caches. If you look on an EVA8400 with 14GB of cache you only have 832MB of write cache per controller mirrored. What might be happening is that when copying from the FC to FATA you are actually filling the write cache faster than the disks can empty it whereas copying from FATA to FATA you don't create this situation. Check your controller utilization when doing the Storage vMotions and compare between FC to FATA and FATA to FATA. Are your vdisk's owned by different controllers when going from FC to FATA and FATA to FATA?
Another thing to keep in mind is that FATA really don't like 100% write with RAID 5 configurations. RAID 5 also adds a significant overhead to the controllers so perhaps you could try the excercise with a RAID 1 FATA vdisk. This will give the FATA LUN a significant increase in IOPs especially if you are Storage vMotioning from a RAID 5 vdisk.
Not sure if this helps or hinders
Another thing that migh be causing it is the type of datamover that is being used. Are these LUNs both formatted with the same blocksize? If they aren't the legacy datamover will be used which means data will travel all the way up in the stack and then come down again. When blocksizes are similar the new datamover is used which means that the data takes a short cut and even possibly VAAI offloading can be used.
Thanks a lot for your help, the block sizes are different, as we created the old 250GB FC LUNs (datastores) a few years ago under ESX 3.5, not having the ability to extend a datastore as it is possible now (using extends was not an option).
Nowadays we create datastores with 4MB block size, 1TB max size is enough.
Seems to me, that we do not really have any "error", just some configuration issues, which are the result of a growing system.
Looks like I will have to start some new project to move our VSphere farm forward.
Yes this more than likely the result of a combination of factors, but probably due to the difference in blocksizes and the use of the legacy datamover. I wrote about it a week ago or so by the way:
Thanks for the link, very nie piece of information.
I just checked our EVA, no VAAI for me, HP is working on it as it seems.
With or without VAAI, when there is a different blocksize used the legacy datamover is utilized which will slow it down substantially compared to the new datamover which is used when blocksizes are equal.
That's what I red in you article :-)
Time for a migration to a datastore structure, where equal block sizes are configured.
This makes for interesting reading indeed: http://www.yellow-bricks.com/2011/02/18/blocksize-impact/
Looks like this may indeed provide a simpler explanation however would be interested in your findings if you retry the exercise going from datastores with the same blocksize across DGs. If would be interesting to know just how much faster the new datamover is for this type of operation.
FYI - the EVA 8x00 is expected to have VAAI support Q3 2011 if you believe the HP marketing guys
SVmotion of 15GB VM
From To Duration in minutes FC datastore 1MB blocksize FATA datastore 4MB blocksize 08:01 FATA datastore 4MB blocksize FC datastore 1MB blocksize 12:49 FC datastore 4MB blocksize FATA datastore 4MB blocksize 02:36 FATA datastore 4MB blocksize FC datastore 4MB blocksize 02:24
This is a first result of my tests.
Not mentioned is the current load of the SAN in total, as the tests were performed at different times.
The improvement after changing the blocksize is imense, I never expected this kind of performance boost.
I have run several more tests, showing equal times for the migration, but the figures above should be enough to show, what equal blocksize configuration does to SVmotion performance between different types of disks.
I've posted the results of my tests and some thanks to the participants of the thread in my blog.