3 Replies Latest reply on Dec 4, 2019 11:46 PM by ManivelR

    ISCSI APD timeout issue in ESXi 6.7.0

    ManivelR Enthusiast

      Issue APD timeout issue in ESXi 6.7.0

       

      Our setup:-

       

      This is pre-prod setup with 4 ESXi servers backed by Windows ISCSI storage.

      ISCSi data store  size is 10 TB( presented from Windows physical ISCSI server) and the same is presented to all the 4 ESXi servers.

       

      When Im try to do storage vmotion of any big size VM from local datastore(600 GB VM) to ISCSI data store,it gets interrupted in the middle(I mean after 60 % of storage vmotion,esxi servers one by one one goes to not responding state because of APD situation),To solve this issue,we should take one by one ESXi reboot and happens frequently whenever we do storage vmotion.

       

      See the logs below:-

       

      2019-12-03T21:36:32.018Z: [APDCorrelator] 608631351756us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.604b53240003606301d5aa1ee87489b0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed. 2019-12-03T21:36:32.018Z: [APDCorrelator] 608633935712us:

       

       

      [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.604b53240003606301d5aa1ee87489b0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed. 2019-12-03T21:36:47.993Z: [iscsiCorrelator] 608647326769us: [vob.iscsi.target.async.event]

      The target iqn.1991-05.com.microsoft:-mts-iscsi-win-lun3-target issued an async event for vmhba64 @ vmk1 with reason code 0000

       

      To solve this issue,

       

      1) We have increased the value of APD timeout to 300 seconds instead of 140 seconds (on all the ESXi hosts) Change Timeout Limits for Storage APD

      2) We have disabled the "Delayed ACK" on all the 4 ESXi hosts    VMware Knowledge Base

       

      After this above 2 parameters change,now it is stable for the last 12 hours and storage vmotion got completed without any issues.

       

      Any other recommendations(with respect to APD time out value) is there from ESXi/ISCSI end ? please suggest.

       

      Thanks,

      Manivel R