GoodForAllPaul
Contributor
Contributor

Slow copy/paste vmdk files on SAN Volume

We have recently commissioned a ESXi 3.5 environment using HP Blade c7000 and BL460 blades using 4Gb HBA's connecting to "DataCore" caching server in each geographic location in a HA/DRS configuration. This ESX Server then writes to the local DataCore device, which then writes to the local SAN and transmits the write to the remote DataCore to write it to that SAN as well. across a 2x1Gb circuit. Reading happens locally.

===================[GEO2]

ESXi 3.5 =================ESXi 3.5 <-- HA / DRS

Datacore1 <==1Gb circuit==> Datacore2 <-- clustered

SAN(a) ==================SAN(b)

The observations are that a 40Gb P2V takes 2h20min over a 1Gb local LAN segment. Then copying that vmdk file to another folder on the same volume takes 3h

I am pretty sure 2-3 hours for this size of data is excessively high considering the 4Gb SAN network, high-speed SAS drives in the SAN and 8Gb caching servers in the middle.

What I am trying to understand is when you browse a data store with VIC and copy and paste a file. What is the data transfer mechanism path. I mean if the move essentially needs to use the VIC in some way as a mediator the copy and paste then that will limit the move to the speed of the network . Or if the mechanism for moving data between folders on a VMFS volume is something other than native SAN protocol like I believe the p2v uses FTP.

0 Kudos
4 Replies
Chuck8773
Hot Shot
Hot Shot

Welcome to the forums!!!

One question for you, is the Datacore writing to the other Geo site via synchronous replication? If so, your local writes will wait for the remote write to complete prior to completing locally.

To answer your question about copy and paste. When you copy and paste through the VIC, the copy process happens on the ESX host. The data does not flow through the VIC. The data flows from the source to the destination through the ESX host.

Charles Killmer, VCP

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

Charles Killmer, VCP4 If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
GoodForAllPaul
Contributor
Contributor

Charles,

I am aware that the data writes will be delayed as that is the nature of synconise data distribution over a time/distance barrier. What I was wanting to know I believe you have in part answered.

The VIC client does not play a part in the datastore copy/paste function and in my case, where I am simply copying and pasting the data within the same volume and not across different LUN/logical volumes on the SAN it would imply to me that the data change should be taken care of by the SAN controllers. I was wanting to make sure of that. As someone here who is also a VCP claimed it uses FTP

I am creating a 10Gb SAN LUN to local Datacore then allocated to the local ESX server to test the difference between replicate SAN storage vs un repicated through a datacore server. I will give my findings

Paul

0 Kudos
Chuck8773
Hot Shot
Hot Shot

When ESX copies from one location on the LUN and pastes to a different location on the same LUN, the data flows from the SAN, to the host and back to the SAN. The SAN doesn't know anything about the file structure, unless you use NFS. There are a few SAN vendors out there that are working on integrations with VMWare where it can tell the SAN what to copy and where to put it. This would remove the data from needing to travel to the ESX host prior to being pasted. This is not an ESX feature.

I would investigate each network link for the bottleneck. Try removing the sync replication. Does the problem go away. If it does, the replication link needs some attention if you need it to be sync.

Also the file copy process on the ESX side does not use FTP. It is just a file copy operation on a VMFS LUN.

Charles Killmer, VCP

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

Charles Killmer, VCP4 If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos

Chuck8773 gives some great advice, the only think left to do might be Jumbo Frames. The only other issue with JF is that it will be supported config only when you will update to ESX 4.0, then enable jumbo frames on the ESX host, on TCP/IP switch and on data core windows boxes. This would allow the network portion of the operation to act more efficiently. You can also monitor the SAN disk keeping an eye on the disk queue length (the bottle neck might be at the disk level as well if the copy of going from location A to location B both being on the same set of spindles).

StarWind Software R&D

StarWind Software R&D http://www.starwindsoftware.com
0 Kudos