5 Replies Latest reply on May 19, 2014 11:23 AM by lijet

    Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)

    lijet Novice

      HI Friends,

       

      I have received some warning msgs on my vSphere Client as below: (enclosing the screen as well)

       

      Device naa.60a98000424758394c2b445a622f6e4f performance has deteriorated. I/O latency increased from average value of 1563 microseconds to 32018 microseconds.warning 5/7/2014 11:51:00 PM ESXi-1.vsphere.local


      Enclosing esxtop - D / U / V commands output here with for your ready reference...

       

      I would really appropriate if some one can advice me the reason & its solution,

       

      Many thanks in advance...        

       

       

      I_O Latency Increased from Average Value of 1897 microseconds to 236720 microseconds .jpg             

      esxtop_d._Host-1.jpgesxtop_U_host1.jpgesxtop_V_host1.jpg  

        • 1. Re: Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)
          Wh33ly Hot Shot

          There can be different solutions/problems, depending on the type of storage.

          For  a quick view look at:  VMware KB: Storage device performance deteriorated

           

          I also noticed that the messages are almost a month old, hope you can still find enough data for troubleshooting. ESXTop screens don't show anything shocking in my opinion, but don't forget this is just the iteration view from a sample time.

           

          Do you still see the messages ? A latency spike is something else then decreasing storage performance over time.

           

          Personally I saw some similar messages when we had some SAN problems because a node was rebooting, this caused extra latency which was higher then average and caused the alarm to trigger. So it might be explainable...

           

          If applicable also start throwing some balls to your storage/SAN team to have a look from their side if they see anything strange.

           

          When you still see the messages on daily bases I should use some more ESXtop and/or dump it to a file for getting a timespan of counters. (Only storage related would be enough). Then analyze it with ESXplot/Excel/perfmon or whatever tool to see if it are only spikes.

          • 2. Re: Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)
            virtual_knight Enthusiast
            VMware EmployeesvExpert

            The error message about deterioration was introduced with vSphere 5.x, essentially the logging is enabled to track minor IOPS performance degradation which previously not captured.

             

            The calculation is that if there is 20x times increase or decrease from the current IOPS speed, it would be logged.

             

            In your case it was 1563 microseconds sample that increased 20x times in an instance .

            Do note that this is microsecond and for esxtop output or general measure/unit is milliseconds

             

            So 32018 is around 32 milliseconds DAVG. From esxtop output you have shared as well the numbers reflect around 0~10ms, these are fairly reasonable(actually good numbers)


            This is not a major cause of concern and could happen if there is sudden surge/activity.Like


            If you are getting these messages very frequently and particularly at random times(not coinciding with business hours and no scheduled activities such as back-up/AV scans), you will need to have Storage team investigate this end-end.

             

            Let me know if this helps

            • 3. Re: Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)
              lijet Novice

              Thanks Wh33ly for your interest & reply


              I also noticed that the messages are almost a month old, hope you can still find enough data for troubleshooting. = Yes, these are old screen shorts, but new one are also similar, (see below:)

              vmware-vcenter.jpg

               

              Do you still see the messages ? = Yes,I do see these mgsg similar to these,

               

              When you still see the messages on daily bases  = Yes,I can see these similar msgs on Even tab on daily bases but the occurring time is different,

               

              I should use some more ESXtop and/or dump it to a file for getting a timespan of counters = Yes,I can see these similar msgs on Even tab on daily bases but the occurring time is different,

               

              (Only storage related would be enough). Then analyze it with ESXplot/Excel/perfmon or whatever tool to see if it are only spikes

              = Yes, so shell I run "ESXplot/Excel/perfmon" to get the required info,

               

              or What Tool I can run to get these info..


              Thanks once again for your suggestion......


              ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              @

              The error message about deterioration was introduced with vSphere 5.x, essentially the logging is enabled to track minor IOPS performance degradation which previously not captured. = So can we disable this logging which should not also impact our monitoring of performance for both host?

               

              The calculation is that if there is 20x times increase or decrease from the current IOPS speed, it would be logged. = Sorry! I am not able to understand this match....but i will try Sorry!!!

               

              So 32018 is around 32 milliseconds DAVG. From esxtop output you have shared as well the numbers reflect around 0~10ms, these are fairly reasonable(actually good numbers) = So do you mean there is no issue with these warnings & i should simply ignore these...as I am getting these on daily bases....pls advice...


              This is not a major cause of concern and could happen if there is sudden surge/activity.Like

              If you are getting these messages very frequently and particularly at random times(not coinciding with business hours and no scheduled activities such as back-up/AV scans), you will need to have Storage team investigate this end-end = = Yes, You are very correct, some time i get these msgs when my Antivirus scan runs...or some time not...so can not be sure ....what is the reason of this...

               

              Kindly advice how to do the deep monitoring or my NetApp 2240-2 for catching this error.

               

              Many thanks in advance...

              • 4. Re: Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)
                virtual_knight Enthusiast
                vExpertVMware Employees

                So can we disable this logging which should not also impact our monitoring of performance for both host?

                Sorry! I am not able to understand this match....but i will try Sorry!!!

                = So do you mean there is no issue with these warnings & i should simply ignore these...as I am getting these on daily bases....pls advice...

                 

                Hi lijet,

                 

                I see that you are quite concerned, Let me try and answer your questions quoted..

                 

                1-Logging cannot be disabled.

                2-If ESX host senses 20 times increase in latency, it will display the message that performance is deterirated, i.e. your latency increase from 1563 to 32018(1563 * 20 approx).

                3-If you are aware of planned activities/tasks, the message can be ignored(you can also try to stagger the tasks to off business hours if feasible)

                4-You would need to engage the storage team to monitor SAN activity, also if incase you have VCOPS-there is a netapp adapter to provide you insight about storage

                5-I would suggest the Anti Virus activity more closely, as I reply this thread my anti virus has kicked in and my laptop has come to a crawl :-)

                Check how serious the issue is when AV is not active.

                 

                Reference values : Your storage seems to have good performance ~ 5-10 ms.

                Tips

                -Check the times when the issue occurs and monitor esxtop for a prolonged duration

                -Check if the issue is only seen on one host or multiple hosts (One host implies that the HBA/Cables or imbalanced/High IOPS VMs concentrated on a single host, multiple host is indicative of Fabric/SAN side issue.

                -Check if the issue is on one Lun or many luns( One Lun may mean some underlying disk/RAID  issue or high iops VM concentrated on one Lun, many luns are indicative of SAN/Farbic issue)

                 

                 

                As you can see there are too many possibilities, some I have even missed out. But hopefully you now have a direction where to look at.



                • 5. Re: Device Performance has deteriorated (Storage device performance deteriorated..VMWare & NetApp)
                  lijet Novice

                  thanks once again for bearing with me...& sorry for delay reply:

                   

                  1-Logging cannot be disabled = Fine enough. not a problem

                  2-If ESX host senses 20 times increase in latency, it will display the message that performance is deterirated, i.e. your latency increase from 1563 to 32018(1563 * 20 approx). =Yes; it seems to be 20 times...totally agreed,

                  3-If you are aware of planned activities/tasks, the message can be ignored(you can also try to stagger the tasks to off business hours if feasible) =No planned activities is going on...this is my main concern here... all servers are running all the time includes some random backup jobs on this time I can not see this msgs specially!!!

                  4-You would need to engage the storage team to monitor SAN activity, also if incase you have VCOPS-there is a netapp adapter to provide you insight about storage =Yes...Correct...Already Opened a case with NetApp to look into this...kindly advice is this also happen if I am running out of space...as I can see one msgs in NetApp's Log as below:

                  "Unable to grow volume 'vol_os_datastore' to recover space: Request to grow volume 'vol_os_datastore' failed because there is not enough space in the aggregate. Either create 53.9GB of free space in the aggregate or select a growth of at most +5.34GB"


                  5-I would suggest the Anti Virus activity more closely, as I reply this thread my anti virus has kicked in and my laptop has come to a crawl :-)

                  Check how serious the issue is when AV is not active. =I looked into AV scanning & tried to modified the timings in each host/server but with no luck...this even occurs with some different times then my scheduled time of scanning...so seems something else is effecting...which i can not figure it out..


                  Reference values : Your storage seems to have good performance ~ 5-10 ms. = =Good to hear this....but i want to keep this performance always,

                   

                  -Check the times when the issue occurs and monitor esxtop for a prolonged duration  = But how i can come to know when the issue is going to occur..?

                  -Check if the issue is only seen on one host or multiple hosts (One host implies that the HBA/Cables or imbalanced/High IOPS VMs concentrated on a single host, multiple host is indicative of Fabric/SAN side issue. =I can see this son both hosts

                  -Check if the issue is on one Lun or many luns( One Lun may mean some underlying disk/RAID  issue or high iops VM concentrated on one Lun, many luns are indicative of SAN/Farbic issue) =No it is on One LUNs only..


                  Kindly have a look on the attached some Netapp screen herewith & advice accordingly...


                  many thanks in advance..