VMware Cloud Community
RolandCH
Contributor
Contributor

NAS latency problem during a small filecopy job

We have a latency problem with our NAS in a certain situation. There is a job from a physical server to copy 2.5 GB of data every two hours to a VM via a 1 Gbit/s connection. This VM is doing nothing then to receive these files. The biggest file is 640 MB and the total is 2.5 GB. The NAS is connected to the ESXi 4 host via NFS.

The latency of the NAS is normally between 10 and 50 ms. During the time as the mentioned VM above receive these files, the whole NAS and all the VMs on it have latency times between 700 and 1200 ms....

How can I improve this behavior except to throttling the connection between the physical server and the VM?

Thanks for any help, Roland

Tags (2)
0 Kudos
9 Replies
kjb007
Immortal
Immortal

Not sure the issue in this case is network, but disk speed and spindle count.  How many disks are in your NAS, and what is the speed/size of those disks?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RolandCH
Contributor
Contributor

The NAS has 6 disks within a RAID5. Each disk has 1 TB and 7200 RPM.

Earlier as we had all VMs on a 4 disk RAID5, attached locally to the ESXi host, we didn't have this problem, but the latency was in general bigger.

0 Kudos
kjb007
Immortal
Immortal

I did a post about disk and latency a while back, http://vmwise.com/2010/06/11/my-vm-performance-is-terrible-what-is-wrong-with-vmware .

A 7200 RPM drive has roughly 80 IOPS that can be performed, times 6, gives you 480, assuming one drive is not a spare.  Take into account also that it takes 4 IO's to perform a write in RAID5, gives you less operations that can be done simultaneously.

You can increase the cache, if that's available, you can update and add faster drives.

The key is to figure out how much IO your vm's are doing normally, and then figure out how much is added during the burst operations.

The network could be adding to this, but I would lean more heavily to the disk.  Are you connecting to your NAS with your ESX and physical hosts through a switch or a hub?  Depending on your switch, bandwidth could be shared by a group of ports, so check for that as well.

Hope that helps.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RolandCH
Contributor
Contributor

Thank you for your answer. I have to read your article.

The NAS is connected via a switch to the ESXi host. The switch is ok, the back-plane is able to deliver full speed for all ports.

What's confuses me is, that we didn't have that problem before on a 4 disk RAID5 (also with disks with 1 TB and 7200 RPM).

0 Kudos
kjb007
Immortal
Immortal

Is that when they were local?  Did the local controller have caching?  Does the NAS?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RolandCH
Contributor
Contributor

Sorry, yesterday I was out of the office.

The answer to the first question is yes, we didn't have that problem on a local 4 disk RAID5 (also with similar [or shoddy] disks, with 1 TB and 7200 RPM).

The local controller have enabled write-back cache and enabled read cache (these two options are available).

The NAS has enabled write cache (only this option is available).

Roland

0 Kudos
RolandCH
Contributor
Contributor

I read your article. In the current situation, it doesn't help, but it's generally good to know.

So we still have that problem.

Roland

0 Kudos
kjb007
Immortal
Immortal

I would recommend loading IOmeter in one of your virtual machines, and see how much throughput/io you can drive from your vm before you start having problems.  That would at least give you a baseline for your storage, from the vm at least.  If that is higher than what you are trying to push from your physical server, then something else is going on that may be affecting your performance.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
RolandCH
Contributor
Contributor

I made different tests with IOmeter on the NAS before we started to use it. Do I understand you right, that you recommend to simulate throughput/io with IOmeter within a vm up to a level where the problem starts? This would mean that I have to limit what IOmeter is doing otherwise if I would make an unlimited throughput test, then this would be worse then the actual problem with the transfer from only 2.5GB.

How do I limit what IOmeter is doing? Should I limit with the settings below Burstiness "Transfer Delay" + "Burst Length"?

Thank you very much for your help.

Roland

0 Kudos