1 2 3 4 5 Previous Next 64 Replies Latest reply on Nov 9, 2014 10:57 PM by ErwinInfracontrol Go to original post Branched to a new discussion.
      • 30. Re: Event: Device Performance has deteriorated. I/O Latency increased
        TrevorW201110141 Novice

        Its not about the volume, its about the ISCSI paths (not physical paths). If I have a Dell MD3200i and two hosts and it has two datastores. If I just power on VMs and let them sit idle, I get horrible latency. If I have one host access a datastore, I still have high latency. If both hosts access the datastore, the latency drops to normal.

         

        I have had my connections, my drivers, my setup, etc etc all reviewed by numerous engineers. vMware wants to blame the vendor of the ISCSI product (I.e. Dell and its MD3200i). Dell and FalconStor have perfomed hundreds of tests and it just doesn;t make sense that both have the same issue at the same time. I have gone so far as to completely start fresh installs of Esxi 5 with the latest updates, reloading and configuring everything.

         

        I don't think this is a fixed bug with Esxi 5 for all users. There is something about particular conditions for particular users with software ISCSI. I just think its a vMware problem under certain conditions. It is driving me nuts.

        • 31. Re: Event: Device Performance has deteriorated. I/O Latency increased
          TrevorW201110141 Novice

          This "may" be premature, but I found some references to DelayedAck - a setting that can be setup at mutiple levels for software ISCSI. I edited the advanced settings for the software ISCSI adapter, turned off DelayedAck (at the highest level - all software ISCSI sources would not use it) and rebooted each host. So far (knock on wood) the latency issue has vanished and I am getting normal (low latency) performance.

           

          We will see what happens over the next few days.

          • 32. Re: Event: Device Performance has deteriorated. I/O Latency increased
            vxaxv17 Enthusiast

            Dell recommends that delayed ack be disabled for most, if not all of their iscsi devices.
            Below is a message that was a sent to me for a performance ticket I had open with dell.

            I've disabled delayed ack on the whole iscsi initiator level as i didnt want to have to do it for each connection.

             

            Tcp Delayed Ack

            We recommend disabling tcp delayed ack for most  iscsi SAN configurations.

            It helps tremendously with read performance in most cases.

             

            WINDOWS:

            On windows the setting is called  TCPAckFrequency and it is a Windows registry key.

             

            Use these steps to adjust Delayed Acknowledgements in Windows on an iSCSI interface:

             

            1. Start Registry Editor.

            2. Locate and then click the following registry subkey:

             

            HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<Interface GUID>

             

            Verify you have the correct interface by matching the ip address in the interface table.

             

            3. On the Edit menu, point to New, and then click DWORD Value.

            4. Name the new value TcpAckFrequency, and assign it a value of 1.

            5. Quit Registry Editor.

            6. Restart Windows for this change to take effect.

             

            http://support.microsoft.com/kb/328890

            http://support.microsoft.com/kb/823764/EN-US  (Method 3)

            http://support.microsoft.com/kb/2020559

             

            ---------------------------------------------------------------------------------------------------------------------

            ESX

            For ESX it can be set in 3 places and is actually called Tcp Delayed Ack.  It can be set in 3 ways:

            1.  on the discovery address for iscsi  (recommended)

            2.  specific target

            3.  globally

             

            Configuring Delayed Ack in ESX 4.0, 4.1, and 5.x

             

            To implement this workaround in ESX 4.0, 4.1, and 5.x use the vSphere Client to disable delayed ACK.

             

            Disabling Delayed Ack in ESX 4.0, 4.1, and 5.x
            1. Log in to the vSphere Client and select the host.
            2. Navigate to the Configuration tab.
            3. Select Storage Adapters.
            4. Select the iSCSI vmhba to be modified.
            5. Click Properties.
            6. Modify the delayed Ack setting using the option that best matches your site's needs, as follows:

             

            Modify the delayed Ack setting on a discovery address (recommended).
            A. On a discovery address, select the Dynamic Discovery tab.
            B. Select the Server Address tab.
            C. Click Settings.
            D. Click Advanced.

             

            Modify the delayed Ack setting on a specific target.
            A. Select the Static Discovery tab.
            B. Select the target.
            C. Click Settings.
            D. Click Advanced.

             

            Modify the delayed Ack setting globally.
            A. Select the General tab.
            B. Click Advanced.

             

            (Note: if setting globally you can also use vmkiscsi-tool
            # vmkiscsi-tool vmhba41 -W -a delayed_ack=0)

             


            7. In the Advanced Settings dialog box, scroll down to the delayed Ack setting.
            8. Uncheck Inherit From parent. (Does not apply for Global modification of delayed Ack)
            9. Uncheck DelayedAck.
            10. Reboot the ESX host.

             

            Re-enabling Delayed ACK in ESX 4.0, 4.1, and 5.x
            1. Log in to the vSphere Client and select the host.
            2. Navigate to the Advanced Settings page as described in the preceding task "Disabling Delayed Ack in ESX 4.0, 4.1, and 5.x"
            3. Check Inherit From parent.
            4. Check DelayedAck.
            5. Reboot the ESX host.

             

            Checking the Current Setting of Delayed ACK in ESX 4.0, 4.1, and 5.x
            1. Log in to the vSphere Client and select the host.
            2. Navigate to the Advanced Settings page as described in the preceding task "Disabling Delayed Ack in ESX 4.0, 4.1, and 5.x."
            3. Observe the setting for DelayedAck.

             

            If the DelayedAck setting is checked, this option is enabled.
            If you perform this check after you change the delayed ACK setting but before you reboot the host, the result shows the new setting rather than the setting currently in effect.

             

            Source Material:

             

            http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598

             

            http://www.vmware.com/support/vsphere4/doc/vsp_esx40_vc40_rel_notes.html

             

            http://www.vmware.com/support/vsphere4/doc/vsp_esx40_u2_rel_notes.html

             

            http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html

            • 33. Re: Event: Device Performance has deteriorated. I/O Latency increased
              ansond Novice

              I tried setting the DelayedAck per previous post - however my particular instance is not using iSCSI targets - all of my drives are simple SATA drives directly connected to the host.   I continue to see the warning messages - and ESXi did complain about not finding the appropriate iSCSI stuff when I tried to force set my config...

               

              On a whim, I upgraded my ESXi host to the latest patchset the VMware has: 5.0_update1 - seems to work great as an update, but I still seem to see the log entries even in this latest patchset.

               

              Doug

              • 34. Re: Event: Device Performance has deteriorated. I/O Latency increased
                TrevorW201110141 Novice

                I am still updating the continuing progress/saga here. With the DelayedAck change, I do get substantially better performance (better latency). However, I still have two weird issues.

                 

                1) If I have a t least one virtual machine actively doing something on an ISCSI datastore, I get this kind of performance:

                 

                2025118_1.png

                 

                 

                It is what I would expect with the hardware involved. However, I STILL get events in the event log that the "performance has deteriorated". The event lists a datastore and a time and the values that triggered the event. The problem is that I was watching during the time, I was monitoring with esxtop during that time. I was monitoring with IOMeter during that time. There WAS NO LATENCY ON THAT DATASTORE AT THAT TIME! It was not in the vCenter performance log, nor did it display in esxtop, nr did it show in IOMeter. Clearly, there is a major bug with the code that triggers this event.

                 

                 

                2) Now, my SECOND issue. If I do NOT have, at least, one active virtual machine (reading and/or writing data). If there are VMs powered on, but, essentially, sitting idle. Then I get significantly worse latency and many, many mor eevents in the event log reporting latency errors. Here is a sample:

                 

                2025118_2.png

                • 35. Re: Event: Device Performance has deteriorated. I/O Latency increased
                  tranp63 Lurker

                  I recommends applying the NMP policy to all of your ESX hosts datastores to use ‘Round Robin” to maximize throughput because ‘Round Robin’ uses an automatic path selection that rotates through all of its available paths and enables the distribution of the load across those paths.  The default is fixed setting.  Hopefully, this will resolve the errors relating to latency.

                   

                  "Device naa.60a980004335434f4334583057375634 performance has deteriorated. I/O latency increased from average value of 3824 microseconds to 253556 microseconds."

                   

                  Round Robin Policy.jpg

                  • 36. Re: Event: Device Performance has deteriorated. I/O Latency increased
                    dwilliam62 Enthusiast

                    If you do use VMware Round Robin, you will need to change the IOPs per path value from 1000 to 3.  Otherwise you will not get the full benefit of multiple NICs.

                     

                    For Equallogic devices, you can use this script to set EQL volume to Round Robin and also set the IOPs value.  You can modify it for other vendors.

                     

                    This is a script you can run to set all EQL volumes to Round Robin and set the IOPs value to 3..

                     


                    esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_EQL ; for i in `esxcli storage nmp device list | grep EQLOGIC|awk '{print $7}'|sed 's/(//g'|sed 's/)//g'` ; do esxcli storage nmp device set -d $i --psp=VMW_PSP_RR ; esxcli storage nmp psp roundrobin deviceconfig set -d $i -I 3 -t iops ; done

                     

                    After you run the script you should verify that the changes took effect.
                    #esxcli storage nmp device list

                     

                    This post from VMware, EMC, Dell, HP explain a little bit about why the value should be changed.

                     

                    http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html

                     

                     

                    Another cause of the latency alerts is having multiple VMDK (or Raw Device Maps) on a single virtual SCSI controller.  You can have up to four in each VM, and assigning a unique SCSI adapter greatly increases IO rates and concurrent IO flow.  As with a real SCSI controller, it will only work with one VMDK (or RDM) at a time before selecting the next VMDK/RDM.   With each having their own, the OS is able to get more IOs in flight at once.   This is especially critical for SQL and Exchange.   So the logs, database and C: drive should all have their own virtual SCSI adapter.

                     

                    This website has info on how to do that.  Also talks about the "Paravirtual" Virtual SCSI adapter which can also increase performance and reduce latency.

                     

                    http://blog.petecheslock.com/2009/06/03/how-to-add-vmware-paravirtual-scsi-pvscsi-adapters/

                     

                    Regards,

                     

                    Don

                    • 37. Re: Event: Device Performance has deteriorated. I/O Latency increased
                      Dave McD Enthusiast

                      I am having the same problem with FC storage, which has caused hosts to disconnect from the VC server. That has only happened since I installed SRM 5.

                      I checked my path profile and several were on Fixed instead of Round Robin, so I changed them. I am still getting the latency messages, although the hosts are not disconnecting.

                      The main culprit is a RDM attached to a Linux VM that actually has 7 RDMs attached.

                      All the RDMS are on the 1 datastore, what can I do to improve the performance? Should I consolidate the RDMs, or split them over diferent datastores?

                      • 38. Re: Event: Device Performance has deteriorated. I/O Latency increased
                        dwilliam62 Enthusiast

                        Make sure that the IOPS value on all your volume aren't at the default value.  The default is 1000, which won't leverage all available paths fully.  For iSCSI I use 3.  A similar low value should work well with Fibre Channel as well.  The script I posted would need slight modification to work with FC. 

                         

                        Also, on that Linux VM, how many virtual SCSI controllers are there?   I suspect only one.   "SCSI Controller 0" and the drives are at  SCSI(0:0), SCSI(0:1), etc..) under the Virtual device node box on the right hand side.

                         

                        If so you need to create additional SCSI controllers.   You can have up to four virtual SCSI controllers per VM.  So you'll need to double up on a copy of RDMs in your case.  But any VMs that have multiple VMDKs or RDMs need to have this done if they are doing any significant IO.

                         

                        Shutdown the VM, edit settings.   Select the VMDK/RDM you want to add a controller to and under "Virtual Device node" change the ID from SCSI(0:2) (for example) to SCSI(1:0) using the drop down button and scroll the list until you see SCSI(1:0).    Repeat until you have done this for all the busiest RDMs.  You'll need to double up some.  So your boot drive at SCSI(0:0) should share a controller with the least busy RDM you have, and that would be set at SCSI(0:1).   The two remaining would also need to be on different SCSI adapters, again pair next least busy RDMs.  So they'd be at SCSI(1:1),  SCSI(2:1).

                         

                        Then boot the VM.  You should notice a big difference.  

                         

                        If you have problems with this procedure, let me know.  I have a draft of doc that I put together on how to do this.  Includes screenshots,etc.. 

                         

                        Regards,

                         

                        Don

                        • 39. Re: Event: Device Performance has deteriorated. I/O Latency increased
                          stainboy Novice

                          Just remember like in a windows MSCS with RDM's, if your Linux is "reservating" those LUNS you migth end up with problems when using RR...

                          • 40. Re: Event: Device Performance has deteriorated. I/O Latency increased
                            ModenaAU Novice

                            I am also seeing these messages "performance has deteriorated...." on vSphere 5 update 1, with local disks. Except in my case there is a real I/O problem. This is a fresh build, with just one VM, copying a few GB of data and I can get I/O latency as high as 2500ms....yes, 2500ms, yes, 2.5 seconds!

                             

                            In addition to these types of messages, vmkernel.log also has lots of suspicous looking vscsi reset log entries...

                             

                            The hardware vendor (Cisco, UCS C210) cannot find anything wrong, we have replaced the RAID card, all drivers and firmware check out as supported, VMware also cannot find anything wrong....

                             

                            I see this across two distinct servers too, buth vSphere 5 update 1, so I can only assume a driver/firwmare isue at this point, even though both cisco and vmware say it is all supported.

                            • 41. Re: Event: Device Performance has deteriorated. I/O Latency increased
                              dwilliam62 Enthusiast

                              That reservation issue is when you use SCSI-3 Persistent Reservations.  By default Linux doesn't use them.  (outside of clusters)   MCS has used them since Windows 2003.

                               

                              I run RH, Ubuntu, Mint, Debian, SuSE with RDMs using RR and Dell EQL MEM w/o any issues.

                              • 42. Re: Event: Device Performance has deteriorated. I/O Latency increased
                                dwilliam62 Enthusiast

                                Hello,

                                 

                                If each IO had an average of 2.5 secs then the server/VM would completly stop.  Is that's what's happening?

                                 

                                I would check the cache setting on the controller.  Sounds like it's set to WRITE-THROUGH instead of WRITE-BACK.   What's the status of the cache battery?   Some will periodically drain the battery to insure it actually has a full charge.

                                • 43. Re: Event: Device Performance has deteriorated. I/O Latency increased
                                  ModenaAU Novice

                                  Hi Don, thanks for the input. The VM does not stop, I/O just slows from the 100+MB/sec to anywhere down to a few hundred KB/sec.

                                   

                                  esxtop shows bad DAVG values going up/down anywhere from 50 - 600 and beyond.

                                   

                                  The cache settings are configured on the virtaul drive, and the Write Cache Policy is set to Write Through.

                                   

                                  The adapter has no battery installed.

                                   

                                  Standby...looking into changing the cache setting....

                                  • 44. Re: Event: Device Performance has deteriorated. I/O Latency increased
                                    dwilliam62 Enthusiast

                                    re: Cache..  Write Through is almost assuredly your issue.  You really need a battery backed RAID controller card so you can run write back.  That makes a HUGE difference on writes.  Also since writes aren't cached, writes tend to have higher priority to READS and therefore reads get blocked by the writes. Also w/o write cache, the adatper can't do "scatter-gather" and bring random IO blocks together before writing to disk.  This greatly improves write performance since when you go to one area of the disk, you write out all the blocks for that address range. It helps sequentialize random IO loads.

                                     

                                    If you can use a VM that's not production on a server with write back enabled (even w/o battery) I think your errors will go away or dropped significantly.

                                    Then set it back to WT when using production VMs.

                                     

                                    How many drives and what RAID level are you using on that card?

                                     

                                    I suspect maybe Cisco offers another RAID card with battery?   

                                     

                                    Regards,

                                     

                                    Don