6 Replies Latest reply on Oct 2, 2018 11:41 AM by peetz

    How to identify hard disk that shows "Predictive Failure" (HPE server)

    peetz Master
    vExpertUser Moderators

      Greetings,

       

      I have a vSphere 6.5 based hybrid-mode vSAN Cluster using HPE ProLiant DL380 Gen9 nodes.

      In one of the host's hardware status one hard disk is shown with "Predictive Failure" status. The shell command

         esxcli ssacli cmd -q "ctrl slot=0 pd all show"

      outputs this

       

      Smart Array P840ar in Slot 0 (Embedded)

       

         HBA Drives

       

            physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, Predictive Failure)

            physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:5 (port 1I:box 3:bay 5, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:6 (port 1I:box 3:bay 6, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:7 (port 1I:box 3:bay 7, SAS HDD, 1.2 TB, OK)

            physicaldrive 1I:3:8 (port 1I:box 3:bay 8, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:1 (port 2I:box 2:bay 1, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:2 (port 2I:box 2:bay 2, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:3 (port 2I:box 2:bay 3, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:4 (port 2I:box 2:bay 4, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SAS HDD, 1.2 TB, OK)

            physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SAS SSD, 400 GB, OK)

            physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SAS SSD, 400 GB, OK)

       

      However, it looks like ESXi has not (yet) identified the disk to be "bad". The vSAN status is still "OK" for all disks.

       

      Now, I have a hard time identifying which of the VSAN disks need to be decommissioned and replaced, because I do not know the naa-id of the bad disk.

      The output of

        esxcli storage core device list

      doesn't tell me anything useful.

      The output of

        esxcli storage core path list

      looks better. For a single disk it outputs something like this:

       

      sas.5001438040e9a460-sas.1438040e9a460-naa.5000c5009f66edeb

         UID: sas.5001438040e9a460-sas.1438040e9a460-naa.5000c5009f66edeb

         Runtime Name: vmhba2:C2:T8:L0

         Device: naa.5000c5009f66edeb

         Device Display Name: Local HP Disk (naa.5000c5009f66edeb)

         Adapter: vmhba2

         Channel: 2

         Target: 8

         LUN: 0

         Plugin: NMP

         State: active

         Transport: sas

         Adapter Identifier: sas.5001438040e9a460

         Target Identifier: sas.1438040e9a460

         Adapter Transport Details: 5001438040e9a460

         Target Transport Details: 1438040e9a460

         Maximum IO Size: 4194304

       

      Actually all hard disks are shown to be on Adapter vmhba2 and Channel 2, only the target number counts from 0 to 15.

       

      Now how do I match the port/box/bay notation of the ssacli tool to the adapter/channel/target notation to find the actual ESXi device id of the bad disk?

       

      Thank you for any pointers ...

       

      Andreas