VMware Cloud Community
Marc_P
Enthusiast
Enthusiast

Event: Device Performance has deteriorated. I/O Latency increased

Hi,

Since upgrading to vSphere 5 I have noticed the following errore in our Events:

Device naa.60a980004335434f4334583057375634 performance has deteriorated. I/O latency increased from average value of 3824 microseconds to 253556 microseconds.

This is for different devices and not isolated to one.

I'm not really sure where to start looking as the SAN is not being pushed as these messages even appear at 4am in the morning when nothing is happening.

We are using a NetApp 3020C SAN.

Any help or pointers appreciated.

64 Replies
ModenaAU
Enthusiast
Enthusiast

I am also seeing these messages "performance has deteriorated...." on vSphere 5 update 1, with local disks. Except in my case there is a real I/O problem. This is a fresh build, with just one VM, copying a few GB of data and I can get I/O latency as high as 2500ms....yes, 2500ms, yes, 2.5 seconds!

In addition to these types of messages, vmkernel.log also has lots of suspicous looking vscsi reset log entries...

The hardware vendor (Cisco, UCS C210) cannot find anything wrong, we have replaced the RAID card, all drivers and firmware check out as supported, VMware also cannot find anything wrong....

I see this across two distinct servers too, buth vSphere 5 update 1, so I can only assume a driver/firwmare isue at this point, even though both cisco and vmware say it is all supported.

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

That reservation issue is when you use SCSI-3 Persistent Reservations.  By default Linux doesn't use them.  (outside of clusters)   MCS has used them since Windows 2003.

I run RH, Ubuntu, Mint, Debian, SuSE with RDMs using RR and Dell EQL MEM w/o any issues.

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

Hello,

If each IO had an average of 2.5 secs then the server/VM would completly stop.  Is that's what's happening?

I would check the cache setting on the controller.  Sounds like it's set to WRITE-THROUGH instead of WRITE-BACK.   What's the status of the cache battery?   Some will periodically drain the battery to insure it actually has a full charge.

Reply
0 Kudos
ModenaAU
Enthusiast
Enthusiast

Hi Don, thanks for the input. The VM does not stop, I/O just slows from the 100+MB/sec to anywhere down to a few hundred KB/sec.

esxtop shows bad DAVG values going up/down anywhere from 50 - 600 and beyond.

The cache settings are configured on the virtaul drive, and the Write Cache Policy is set to Write Through.

The adapter has no battery installed.

Standby...looking into changing the cache setting....

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

re: Cache..  Write Through is almost assuredly your issue.  You really need a battery backed RAID controller card so you can run write back.  That makes a HUGE difference on writes.  Also since writes aren't cached, writes tend to have higher priority to READS and therefore reads get blocked by the writes. Also w/o write cache, the adatper can't do "scatter-gather" and bring random IO blocks together before writing to disk.  This greatly improves write performance since when you go to one area of the disk, you write out all the blocks for that address range. It helps sequentialize random IO loads.

If you can use a VM that's not production on a server with write back enabled (even w/o battery) I think your errors will go away or dropped significantly.

Then set it back to WT when using production VMs.

How many drives and what RAID level are you using on that card?

I suspect maybe Cisco offers another RAID card with battery?   

Regards,

Don

Reply
0 Kudos
ModenaAU
Enthusiast
Enthusiast

ok, I changed the cache policy to always write back, and performance has gone through the roof. On a Linux guest I can now see consistent 450+MB/sec writes, over 1000IOPS and the DAVG values are not going over 2. The worst recorded latency was 30ms.

Stressing a windows guest as far as I can with multiple large file copies, the performance is less stellar, but still over 150MB/sec, DAVG seeing up to 50 or so, latency maxed out at 80ms.

Now to get some batteries so I can leave it like this...

Thank you Don for pointing out what I had overlooked!

Reply
0 Kudos
ModenaAU
Enthusiast
Enthusiast

The config is 6 drives, 300GB SAS, single RAID 5.

Apparently the battery is an option.....wtf? Who makes a RAID battery an option? Also just for grins, they dont tell you about this "option" when you order the server, silly me for assuming a RAID card would come with a battery......

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

You are VERY welcome!!  Glad I could help out.

I don't recall the last RAID card that came w/o batteries. Until you get them I would not leave it in WB.  Very risky.

Windows copy is not very efficient, each copy is single threaded.    Using Robocopy or better yet Rich copy yields better results.

Regards,

Reply
0 Kudos
Dave_McD
Contributor
Contributor

Thanks for the reply, this will really help. The only question is, how do I change the IOPS for FC? I can't see the option anywhere.

As for changing the SCSI controllers, I will have to schedule an outage etc as these are production systems. However, you have shown me ther is light at the end of the tunnel!

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

Earlier in this thread I posted a script to change the IOPs value.  There's no GUI option to do so.

#esxcli storage nmp device list

When you run the above command you'll get a list of your current devices, their path policy and for RR policied volumes the IOPS=1000. 

I'm not sure what FC storage you are connecting to but it will have a VENDOR ID.  On EQL volumes that ID is EQLOGIC.  If yours is EMC then you need to change the line in the script from EQLOGIC to EMC.

esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_EQL ; for i in `esxcli storage nmp device list | grep EQLOGIC|awk '{print $7}'|sed 's/(//g'|sed 's/)//g'` ; do esxcli storage nmp device set -d $i --psp=VMW_PSP_RR ; esxcli storage nmp psp roundrobin deviceconfig set -d $i -I 3 -t iops ; done

After you run the script you should verify that the changes took effect.
#esxcli storage nmp device list

Regards,

Don

Reply
0 Kudos
jdiaz1302
Contributor
Contributor

I saw that most of you says that just want to know a way to deactivate the messages but in my case I am having a degraded performance in one of my vm's and packet loss in that VM is not just the message I have other simptoms.

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

Are you connecting to a Dell/Equallogic array?    That's what I'm most familiar with.

Common causes of performance issues that generate that alert are:    (Most will apply to all storage)

1.)  Delayed ACK is enabled.

2.)  Large Recieve Offload (LRO) is enabled

3.)  MPIO pathing is set to FIXED 

4.)  MPIO is set to VMware Round Robin but the IOs per path is left at default of 1000.  Should be 3.

5.)  VMs with more than one VMDK (or RDM) are sharing one Virtual SCSI adapter.  Each VM can have up to four Virtual SCSI adapters.

6.)  iSCSI switch not configured correctly or not designed for iSCSI SAN use.

If this is a Dell array, please open a support case.   They can help you with this.

Regards,

Reply
0 Kudos
irvingpop2
Enthusiast
Enthusiast

Lately we've seen huge increases in performance with a few simple iSCSI tuning methods (NetApp FAS2040 - 4x 1GbE).   We've gone from latency alarms several times per day to none at all.  

I haven't seen these concisely documented anywhere, so here's what we did:

  1. Using bytes=8800 (with Jumbo frames) rather than an IOPS value (or the default)
  2. Make sure the active Path count matches the number of storage adapter NICs on your VM host or Storage system (whichever is less). 
    1. Previously we had iSCSI Dynamic Discovery which added all 4 NetApp paths for each storage adapter vmk (resulting in 16 paths per LUN);  this resulted in "Path Thrashing".   Changed to Static discovery and manually mapped only 1 iSCSI target per vmk.  
  3. Don't use LACP on either side.   LACP completely ruins RR MPIO.
  4. Fix VM alignment.   We had a handful of Windows 2003 and Linux guests with bad alignment.  They didn't do much IO so we ignored them in the past,  big mistake.  (NetApp's performance advisor really helped to nail this down)
  5. Stagger all scheduled tasks.   We found a number of IO-intensive tasks (AV updates, certain backups) all running at the same times in our environment.  

From this article:  http://blog.dave.vc/2011/07/esx-iscsi-round-robin-mpio-multipath-io.html

The command we used is:

esxcli storage nmp device list |grep ^naa.FIRST_8_OF_YOUR_SAN_HERE | while read device ; do
    esxcli storage nmp psp roundrobin deviceconfig set -B 8800 --type=bytes --device=${device}
done

Throughput results:

  • Original:  95 MB/s
  • IOPS=1 or IOPS=3:   110-120 MB/s
  • Bytes=8800:   191 MB/s   (hurray!)
    • 4KB IOPS also saw a 3x improvement over the original configuration

NOTE:   We also changed back from Software iSCSI to the Broadcom NetXtreme II "Hardware Dependent" driver now that the new June 2012 version supports Jumbo frames: https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXi50-Broadcom-bnx2x-17254v502&productI...

If I could do this all over again I would skip iSCSI altogether.   What a complete PITA it has been to get decent performance compared to spending a few grand more for FC.

Reply
0 Kudos
boromicmfcu
Contributor
Contributor

With ESXi 5, Delayed ACK keeps re-enabling itself on my hosts resulting in high Latency on my SAN. It is getting really annoying. Has anyone else experienced this problem? I am disabling it globally on the software iSCIS initiator. I believe a reboot is required when you disable it, so when it re-enables itself I am not sure if it takes effect till the next reboot or when it turns itself back on.

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

Are you at the current build of ESXi v5?

What I've been seeing is that if you just disable the Delayed ACK, it's not updating the database that stores the settings for each LUN. Any NEW luns will inherit the value.

You can check by going on the ESXi console and entering:

#vmkiscsid --dump-db | grep Delayed

All the values should be ="0" for disable.

I find that removing the discovery address, and removing the discovered targets in the "Static Discovery" tab to clean out the db. Then add discovery address back in with Delayed ACK disabled, AND make sure the login_timeout value is set to 60. Default is 5. Then do rescan.

Go back to CLI and re-run #vmkiscsid --dump-db | grep Delayed to verify.

Also you should run #vmkiscsid --dump-db | grep login_timeout to check that setting as well.

Reply
0 Kudos
boromicmfcu
Contributor
Contributor

I am at 5.0.0, 721882

I got 17 `node.conn[0].iscsi.DelayedAck`='x' results back with only 6 of them reporting a 0 and all the rest 1.

I have some scheduled maintenance this weekend, so I am going to install the latest ESXi patch and clean out the discovered addresses.

I also found this Article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=200782... referencing recommendations for EqualLogic arrays and iSCSI logins. We use EqualLoic and the article recommends15 and that is what it is currently set at.  I am not getting any initiator disconnect errors from the SAN.

Is 15 too conservative from your experience?

Thanks

Reply
0 Kudos
dwilliam62
Enthusiast
Enthusiast

The default in ESXi v5 is 5 seconds, in larger groups with many connections, that timeout will be too short.   Setting to 60 covers all scenarios.

Also, VMware will be releasing a patch for 4.1 that will also allow the login timeout to be extended from 15 second default to 60.

Re: Delayed Ack.  I've seen that also.   Worst case I've gone to the static discovery and manually modified each target.  Then repeated on the other nodes.  😞   No fun if you have allot of volumes.

Reply
0 Kudos
chi201110141
Contributor
Contributor

I too am seeing this on 2 of my 3 hosts.  1 host is hardly doing anything (at the moment) the other 2 coming up with these messages mainly out of hours.

All 3 are the same spec, using local storage - raid 6 16x drives. 

esxi v5.0.0, 469512.

Is there a fix?

Reply
0 Kudos
iBahnEST
Contributor
Contributor

@IrvingPOP2

I have been receiving these messages since we first built our solution which consists of HP Blades with 2x1Gb NICs per server, a Cisco 3120G switch and a NetApp FA2040. I've been researching this issue for a long time, and your post has given me hope that there might be a light at the end of the tunnel.  I'm planning on implementing some of your same steps, but I'm curious about a few things from your post:

Make sure the active Path count matches the number of storage adapter NICs on your VM host or Storage system (whichever is less)

We only have 2 links per server to attach to the network, but the FAS2040 has 4 NICs.  The FAS2040's NICs are setup using LACP (Dynamic Multimode-VIFs).  Are 2 links per server enough for this configuration or would you recommend more?

Previously we had iSCSI Dynamic Discovery which added all 4 NetApp paths  for each storage adapter vmk (resulting in 16 paths per LUN);  this  resulted in "Path Thrashing".   Changed to Static discovery and manually  mapped only 1 iSCSI target per vmk.


When I configured my ESX hosts for Static Discovery, the next time I rebooted those hosts the iSCSI paths were gone.  Have you run into this issue?

Don't use LACP on either side.   LACP completely ruins RR MPIO.

NetApp’s documentation (TR-3802) discusses link aggregation and LACP (Dynamic Multimode) looks like the best option on paper as opposed to using EtherChannel (Static Multimode) due to the fact that EtherChannel is susceptible to a “black hole” condition.  I’m curious which way you configured your storage and switch since you removed LACP.  Would you be so kind as to paste the configs from your NetApp and Switch?

Lastly, out of all the changes that you made, which would you say was the most helpful?

Thanks!

Reply
0 Kudos
irvingpop2
Enthusiast
Enthusiast

iBahnEST,

Many months now since we implemented these changes,  many more lessons learned.   Let me summarize them:

Regarding the NetApp FAS2040:

Lower-end NetApps give poor throughput (MB/s) compared to "dumber" arrays.  However they give much better IOPS, so the trade-off is yours to make.   The summary of my many conversations with NetApp I learned:

    1. FAS2040 has a really tiny NVMEM cache (512MB, but only 256MB usable at a time).    Your statit and sysstat output will show huge amount of flushing to disk during write because of "nvlog full"
    2. WAFL is spindle-greedy.   If your aggregate RAID groups are less than the recommended size (16-20 disks) your throughput will suffer badly (like 15 MB/s per disk).  a 2040 only has 12 disks (split among 2 controllers) so the RAID groups are super un-optimized no matter what kind of disk you use.
    3. ONTAP 8 is RAM-greedy, especially with fancy features like Dedupe.    FAS2040 controllers only have 4GB of RAM each,  and NetApp will tell you that only 1.5GB is left to work with once the OS is booted.  See NetApp communities,  people with 4GB RAM filers (2040, 3100) are getting crushed by the upgrade to 8.1 when dedupe is involved.   Remove Dedupe and don't go higher than ONTAP 8.0.4.

In our case, we shifted our backups (Netbackup direct-style off-host backup) from iSCSI to FC, thinking our iSCSI setup was still sub-optimal.   Sustained throughput (read only) still around 90-110 MB/s.       

For the math-challenged, that is still comparable to what a single iSCSI gigabit line can achieve with Jumbo frames enabled).

Regarding iSCSI

  • In summary, I would never use iSCSI on another production system.  Ever again.   The amount of effort required to tune and monitor is huge and you STILL get sub-par performance.  Just not worth it.  
    • For NetApps, use NFS.  Even NetApp will tell you that the performance will be much better.
  • The biggest performance improvements we got (in iSCSI were):
    1. Reducing the number of iSCSI paths per LUN.   1-2 is enough, especially if you are storage throughput limited.
    2. 2 paths between VM host and storage doesn't mean 2 paths.   Because you'll map iSCSI session per "path" per LUN,  you will still have contention on your paths between various LUNs.  
    3. Definitely don't use LACP with iSCSI MPIO.    Remember that once a mac address pair has been assigned to an LACP channel it is stuck there until that channel goes down.  We found lots of link contention on both the NetApp and VM host side because LACP is dumb in the way it assigns and then never re-balances.   NetApp recommends LACP for NFS only.
    4. We went back from bytes=8800 to iops=1 as we found during business hours there was less latency spikes.     Because of point #2 above,  2 iscsi sessions will try to cram 8800 bytes down a single path (causing contention)

Regarding your static discovery question:   Are you getting the paths by Dynamic discovery and then removing the dynamic entries?    Best to remove all the dynamic stuff, reboot, and then add the entries manually.

I can share with you a NetApp rc section which simply shows all 4 gigabit interfaces configured for iSCSI only.   Ports going to 2 different switches:

ifconfig e0a 192.168.15.11 netmask 255.255.255.0 partner e0a mtusize 9000 trusted -wins up
ifconfig e0b 192.168.15.12 netmask 255.255.255.0 partner e0b mtusize 9000 trusted -wins up
ifconfig e0c 192.168.15.13 netmask 255.255.255.0 partner e0c mtusize 9000 trusted -wins up
ifconfig e0d 192.168.15.14 netmask 255.255.255.0 partner e0d mtusize 9000 trusted -wins up

Sorry for the lengthy post, hope thats helpful.

Reply
0 Kudos