Re: Concerned with latency with NFS and SAN

fpicabia · ‎08-18-2014

I've been looking for clues on tuning storage performance. We're in higher education, it is August

and we have not yet run our SAN based VMware systems with the full brunt of student load.

I'm seeing clues we should be concerned now.

We've run VMware ESX for over a year on local disks. We implemented an EMC SAN and

moved everything there in June this year.

What clues are there?

For example, bash auto completion can be laggy on a VMware system. If I type just the letter h

and then double tab, it should display everything in my path beginning with h. On several

VMware systems this is lagging. On native disk systems, it is quick. Once this is cached

in memory I can't reproduce it.

Another example: a copy of a 16 GB directory from one location to another on a VM guest

pushes the load to 4 in Linux, and this is a dev system with no users putting load on it.

Looking at iostat -dxk, I'm able to compare two DNS servers. The non-VMware system

has await values always in the single digits. The VMware system has await values much higher.

Here are the averages displayed:

vmware host:

Header 1	Header 2	Header 3	Header 4	Header 5	Header 6	Header 7	Header 8	Header 9	Header 10	Header 11	Header 12
Device	rrqm/s	wrqm/s	r/s	w/s	rkB/s	wkB/s	avgrq-sz	avgqu-sz	await	svctm	%util
sda	0.02	18.88	0.12	5.41	5.78	97.13	37.22	0.85	153.19	8.84	4.89

native disk host:

Header 1	Header 2	Header 3	Header 4	Header 5	Header 6	Header 7	Header 8	Header 9	Header 10	Header 11	Header 12
Device:	rrqm/s	wrqm/s	r/s	w/s	rkB/s	wkB/s	avgrq-sz	avgqu-sz	await	svctm	%util
sda	0.03	9.46	0.85	7.73	76.85	68.60	33.91	0.06	7.06	6.09	5.22

I found an interesting blog article discussing tuning.

VMware I/O queues, micro-bursting, and multipathing - Virtual Geek

I wanted to check my QUED values in ESXtop, but for our NFS shares on the SAN, the tool is displaying only a dash

for QUED, USD and LOAD.

I tried out visualEsxtop. It also seems to have blind spots.When I connect with any VM server,

I get stats with zeros everywhere for Disk Partition, Disk Path, Disk World, and Disk Adapter.

Only VSCSI tab displays some live numbers. In vSCSI, I see LAT/wr go as high as 1000

periodically, but typically there are 10 hosts doing relatively light load, and one with three

digits LAT/wr, four guest systems with two digits of LAT/wr, and the remainder less than that.

So I have a couple of questions:

1. How can we see stats on IO with NFS datastores in use on the SAN?

2. Am I right that the iostat value on await shown for a lightly loaded DNS server

is a hint we will see performance problems when thousands of students return?

--FP

JPM300 · ‎08-18-2014

hey fpicabia,

How is your EMC SAN setup for NFS. Are you using 1GB connections or 10GB connections? How is your vSwitch networking setup for NFS? Another thing to note is there is no native way to get multipathing with NFS in VMware as of yet, apprenetly its coming but as of ESXi 5.5 there still is no native multipathing. Which means you either need to have 2 1G/10GB connections in active passive, or some people have tried using LAG/LACP setup with NFS to achive this, however I've never gone this route and would imaging this would varry largely on the SAN providing the NFS. Which EMC SAN model did you end up going with? Some of the entry modies like the EMC VNXe's are VERY entry level and struggle with higher IOPS requirements

When you run ESXTOP what is your DVAG commonly at? This shouldn't be spiking over 100 consistantly, most people even say it shouldn't spike consistantly over 32-50.

Let us know

fpicabia · ‎08-18-2014

Hi JPM300,

Thanks for the response. We've got a VNXe3150.

We have 10 G ethernet and cacti graphs show very little bandwidth

used of this capacity. At backup cycle, any port on the Cisco switch

peaks at 100Mbits. Otherwise it is normally during the day

it is using less than 20Mbits bandwidth.

We're talking very low utilization on the systems

which run on VMware, at this point in the year anyway.

The fabric set up was done by a consultant and is supposed

to provide some load balancing and fail over. Before the SAN

was sized, we ran some IOPS tools from Dell to create a composite

graph of the IO needs.

In ESXtop we could not get a value for DAVG. The only column

showing a value for NFS share devices was GAVG which often

hovered around 80, sometimes a little over 100.

--FP

JPM300 · ‎08-18-2014

Hey fpicabia,

I'm curious on how the consultant setup load balancing on NFS. The only ways I can think of is 2 connections in a LACP connection or active passive per controller. Or maybe the load balacning was just that each controller has 1 NFS IP / Shares and your laod balancing it that way.

According to VMware the GAVG shouldn't be as high as 80/100 see the following links:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100820...

ESXTOP - Yellow Bricks

So this could be why your seeing performance issues creep up. Unforantly the EMC VNXe 3150's don't have a great amount of insight into what the disk subsystem is doing metric wise unless you break into the tech maintanence mode at the CLI. I have had to do this with EMC a few times, you may want to give EMC a quick call and see if they can dig up as to why your disk sub system is seeing a high GAVG, especially since your network throughput isn't going over 100MBs

Side question, if you run a stress test on a VM can you push the network bandwitch past 100mbs?

Here is a link with some other suggestions with tweaking some of the fine tuned metrics:

http://community.spiceworks.com/topic/373945-vnxe-latency-problems-with-vmware?page=1

Also what kind of raid do you have setup in your disk pool?

Another thing we noticed when we tested the VNXe series is anytime a request for a LUN has to go through the backplane to reach the disk speed SUFFERED TERRIBLY. Just something to keep in mind.

King_Robert · ‎08-18-2014

Poor storage performance is generally the result of high I/O latency. vCenter or esxtop will report the various latencies at each level in the storage stack from the VM down to the storage hardware. vCenter cannot provide information for the actual latency seen by the application since that includes the latency at the Guest OS and the application itself, and these items are not visible to vCenter. vCenter can report on the following storage stack I/O latencies in vSphere.

Storage Stack Components in a vSphere environment

GAVG (Guest Average Latency) total latency as seen from vSphere

KAVG (Kernel Average Latency) time an I/O request spent waiting inside the vSphere storage stack.

QAVG (Queue Average latency) time spent waiting in a queue inside the vSphere Storage Stack.

DAVG (Device Average Latency) latency coming from the physical hardware, HBA and Storage device.

To provide some rough guidance, for most application workloads (typically 8k I/O size, 80% Random, 80% Read) we generally say anything greater than 20 to 30 ms of I/O Latency may be a performance concern. Of course as with all things performance related some applications are more sensitive to I/O latency then others so the 20-30ms guidance is a rough guidance rather than a hard rule. So we expect that GAVG or total latency as seen from vCenter should be less than 20 to 30 ms. as seen in the picture, GAVG is made up of KAVG and DAVG. Ideally we would like all our I/O to quickly get out on to the wire and thus spend no significant amount of time just sitting in the vSphere storage stack, so we would ideally like to see KAVG very low. As a rough guideline KAVG should usual be 0 ms and anything greater than 2ms may be an indicator of a performance issue.

So what are the rule of thumb indicators of bad storage performance?

• High Device Latency: Device Average Latency (DAVG) consistently greater than 20 to 30 ms may cause a performance problem for your typical application.

• High Kernel Latency: Kernel Average Latency (KAVG) should usually be 0 in an ideal environment, but anything greater than 2 ms may be a performance problem.

So what can cause bad storage performance and how to address it, well that is for next time…

And as a side note: Check out your local VMUG (VMware User Group). The VMUG community has more than 75,000 members with more than 180 local groups across 32 countries. Many local area VMUGS have free user conferences that are a great opportunity to learn from in-depth technical sessions, demonstrations and exhibits and network with other VMware customers and partners. That is where I’ll be this week, presenting Storage Troubleshooting and Performance Best Practices at the Denver VMUG.

JPM300 · ‎08-18-2014

Good response King!, that picture is acutally one of the best I've seen to help explain the differences between the disk metrics. Seeing as our OP is seeing GAVG of over 80ms I would say he needs to dig deeper into the SAN unless most of the time is coming from the KAVG.

fpicabia · ‎08-18-2014

I think the load balancing was done via round robin set up and I see something about LACP

channel for CIFS and NFS in my notes.

The best stress test we have at this point is the backups all running around the same time. That reached

the 100 Mbps, but I've seen cases before where Cacti doesn't see actual peaks because it polls

once every 5 minutes. The copy of 16 GB of file structure was a bit of a stress test, but it

didn't push the network for the SAN at all. CPU on the guest host showed up to 40% wait time.

The switch is a new 10G Cisco switch, with jumbo frames activated on both sides.

A consultant from the same company set this up as well.

The RAID setting isn't showing on any screen I'm looking at I'm reading at this site:

http://www.emc.com/collateral/hardware/white-papers/h8178-vnxe-storage-systems-wp.pdf

I believe page 22 indicates we would have Raid 5 of 4+1. I don't see anything

in Unisphere to confirm this setting. I see one big Pool 1 with 70 disks, all SAS.

It is particularly frustrating that we can't get stats on the NFS datasource. All pages I see on the

topic talk about viewing graphs in vsphere or getting numbers in esxtop, and none of those things work.

e.g. if I go to Storage in vsphere web client, select one of the NFS datastores, then performance,

I see a graph for "Storage I/O Control Normalized Latency" and it states: "Data is not collected

for the current statistics level. Increase the statistics level to view the graph". If I make

the Time Range Realtime, the graph displays and the value is flatline zero.

We did go into an Administration screen (desktop vsphere client) to increase the

logging from 1 to 2 in hopes it could help, but there was no difference in the Performance screen for storage.

In vsphere, Under the Manage tab and Settings, I see Storage I/O Control and it is disabled. Attempt to enable

produces a message we are not licensed. Do we need that just to get stats from NFS storage datastore?

If not, then why can't we see stats?

Another one that fails is looking at a vmware server, selecting Monitor and Storage Reports. It shows

"The storage service is not initialized. Try again later." Later? Like what? After the full moon?

I don't understand what you said about a request for a LUN to go through the back plane. Does that mean disk

local to the VMware system?

--FB

fpicabia · ‎08-18-2014

Getting the stats you mention is half the battle. The only one appearing in ESXtop for an NFS datastore is GAVG, which can be 3 digits at times.

Maybe we can at least assume now that latency is an issue, but we don't have it granular yet without more stats of where it is coming from in the layers.

Looking at the blog article I linked in the OP, I see confusing things like:

If you find your array service time is long, or the array LUN queue (if your array has one) is a problem – you need to fix that before you look at queue depths and multipathing. On EMC arrays – this can be done easily and is included as a basic array function.

Does anyone know what that means?

But again, I need all of the stats to know if this solution might fit, and nothing I'm trying shows data like this from vSphere. There is Monitor-> Performance-> Advanced and then selecting Datastore under view on the right. It shows read latency and write latancy. Max values are 2205 and 1454 on one of the NFS datastores. The graph shows it hovers around 200ms over the last hour.

--FP

JPM300 · ‎08-18-2014

Hey FB,

There should be a place somewhere in unisphere that tells you what your Disk pool's raid system is. It probably is Raid5 or Raid6

What I ment by the backplane is this:

Now when you create a LUN you have to assign it to a ServiceProcessor on the SAN. This Service Process will serve any request to that LUN always unless a failure occurs. So in this case I made all odd numbers assigned to SPA in the diagram, and all even numbers assigned to SPB. So if a request came in for DatastoreSAS01 to say save a file a VM running on DatastoreSAS01, the request would select the path that would lead to SPA then it would save the file in one of the disk drawers where that VM's files are located.

However in the event that SPA Fails, SPB spins up another virtual service controller per say, or takes over all of SPA's LUN responsibilities so there is no outage or downtime. However to gain access to the drawers that SPA is connect too, you can see there is no direct connection. Each SP SAS cables go up to the drawers accordingly. However each drawer and SP has a backplane which connects them so you can still gain access to the Disks in the event of a failure.

So in this picture I'm exaggerating a bit as it woudln't have to go up the entire shelf to get to the other side, it would simply find the best backplane to take, however we found when testing the VNXe3100-3150's that transfer speeds dropped by more then 50% when you lost a SP. Our speeds went from 100mbp/s transfering files between VM's to 30mbp/s when transfering file. It something you can test pretty easily just pull one of your SP into maintence mode during an outage windows and you can test this as well:

Now going back to my original comment, if for some reason your pathing was not working "correctly" and it was having to transverse the backplane somehow to get to LUNS that SPA controls through SPB becuase that is the path it took it could count for some of your latency, however in most cases as long as the connections are setup correctly VMware's Multipathing selects the proper path assuming your settings are to what the vendor reccoemends which you have already double checked.

JPM300 · ‎08-18-2014

If you have a support contract with EMC you can give them a call and open a case, they have other metrics they can collect on these boxes that are no visibile through the gui. They have some CLI commands they can run to collect esxtopish stats from the cli for troubleshooting. Maybe give them a quick call and see what they can dig up.

fpicabia · ‎08-19-2014

Thanks to JPM300 and King_Robert for your efforts in helping, and explaining the issues needing analysis. You've gone the full way with providing detailed information and helpful diagrams.

I was hoping this was something I could identify and tweak on vmware parameters, but it appears that with this entry level SAN we will need to use EMC support to get the metrics and perhaps also the analysis of what is going wrong. If I learn what was going on, I'll come back to this thread with the news.

JPM300 · ‎08-19-2014

Thanks Fpicabia and let us know how it works out as I'm curious to the resolutions

Just out of curiosity do any of your ESXi hosts have any SSD disks? as if you get the SAN working better but it still falls short of your performance expectations its possible you could look into a local caching option to help spare the SAN some IOPS and drop the latency. Anyhow good luck! keep us posted