2 Replies Latest reply on Nov 15, 2019 4:31 PM by lansol

    Repeated Datastore Disconnection Every 23 minutes

    lansol Lurker

      We are troubleshooting an issue where our VMs are hanging for 5-10 seconds every time our datastores disconnect and reconnect.

      The load on the server is minimal with a little bit of I/O. The disconnections happen every 23 minutes whether there is a lot of IO or not.

       

      There are three datastores. If all guest VMs are powered off except for one VM which only has VMDK files on ONE datastore, the controller will still show large IO spikes, however it may not fully reset all datastores. During business hours or at night during backups where there is constant RW operations going on, the datastores will always reset and come back. We are receiving some application errors due to this in one of our databases.

       

      Disconnects.png

       

      Dell PowerEdge R540 (no cluster)

      • PERC H730P Adapter (Embedded)
      • Firmware: 25.5.6.0009 (Latest)
      • 4 x 2TB 7.2k RAID 6
      • 2 x 200GB SSD RAID 1
      • 2 x 600GB 15k RAID 1
      • All volumes have Read-Ahead and Write-Back enabled

       

      ESXi 6.7.0 Update 3 Build-14320388 (A00)

      • No snapshots in place
      • Driver version 7.708.07.00-3vmw (original driver)
      • Driver version 7.710.07.00-1OEM.670.0.0.8169922 (*)
      • VMFS3.UseATSForHBOnVMFS5 is set to default (1). We tried setting value to (0) with no improvement.

      * This driver is supported only on Dell PowerEdge Servers R6525, C6525, R6515 and R7515

      * After contacting Dell, they recommended installing the above driver to test as the changelog indicated it addressed my symptoms. No improvement, however

      * SCGCQ02033302 Resolves issue in which non-RAID drives may not be listed during OS installation or in vSphere.

      * SCGCQ02189085 Fixed an issue that could cause an IO timeout and controller reset under certain workloads.

       

      As mentioned above, this happens EVERY 23 minutes.

       

      If anyone is able to offer assistance I would be forever grateful.

        • 1. Re: Repeated Datastore Disconnection Every 23 minutes
          daphnissov Guru
          Community WarriorsvExpert

          As a test, I would disable write-back cache and see if it makes any difference. Your overall latency may be higher, but maybe there's something in that controller's microcode that is generating this.

          • 2. Re: Repeated Datastore Disconnection Every 23 minutes
            lansol Lurker

            Thanks for the reply. We didn't try changing the writeback policy.

            Changing firmware and driver versions in the 2019 time frame didn't seem to make any difference.

             

            Astoundingly, the solution was to downgrade the firmware of the iDRAC on our server to a version from December 2018. Apparently, the iDRAC controller polls the storage controller on a schedule and that was causing the datastore disconnections.

            Dell supposedly will have an updated iDRAC firmware out in December 2019 that should fix this issue.