VMware Cloud Community
quantam
Enthusiast
Enthusiast
Jump to solution

Disks degraded in ESXi

I have a ESXI v6 u3 installed on a DL380 G9. I have done Raid from the HP provisioning itself. After installing the ESXI and presenting the disks to it. The disks show me as degraded:

naa.600508b1001ce2641dcf3c827c33d0df

   Display Name: HP Serial Attached SCSI Disk (naa.600508b1001ce2641dcf3c827c33d                                                                                     0df)

   Has Settable Display Name: true

   Size: 1144609

   Device Type: Direct-Access

   Multipath Plugin: NMP

   Devfs Path: /vmfs/devices/disks/naa.600508b1001ce2641dcf3c827c33d0df

   Vendor: HP

   Model: LOGICAL VOLUME

   Revision: 5.04

   SCSI Level: 5

   Is Pseudo: false

  Status: degraded

   Is RDM Capable: true

   Is Local: false

   Is Removable: false

   Is SSD: false

   Is VVOL PE: false

   Is Offline: false

   Is Perennially Reserved: false

   Queue Full Sample Size: 0

   Queue Full Threshold: 0

   Thin Provisioning Status: unknown

   Attached Filters:

   VAAI Status: unknown

   Other UIDs: vml.0200010000600508b1001ce2641dcf3c827c33d0df4c4f47494341

   Is Shared Clusterwide: true

   Is Local SAS Device: false

   Is SAS: true

   Is USB: false

   Is Boot USB Device: false

   Is Boot Device: true

   Device Max Queue Depth: 1024

   No of outstanding IOs with competing worlds: 32

   Drive Type: logical

   RAID Level: RAID1

   Number of Physical Drives: 2

   Protection Enabled: false

   PI Activated: false

   PI Type: 0

   PI Protection Mask: NO PROTECTION

   Supported Guard Types: NO GUARD SUPPORT

   DIX Enabled: false

   DIX Guard Type: NO GUARD SUPPORT

   Emulated DIX/DIF Enabled: false

May I know the reason for this and how to fix it.

Tags (2)
1 Solution

Accepted Solutions
quantam
Enthusiast
Enthusiast
Jump to solution

We had opened a case with both HP and VMWARE. HP replied as follows:

This field for the device is specifically reserved for indicating the path status of the device. If  there is only one path to the target then the status is "degraded".

The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.

This does not have any effect on the functioning of the LUN.

VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local"     (replace the diskid with the disk identifier)

esxcli storage core claiming reclaim -d diskid   (replace the diskid with the disk identifier)

esxcli storage core claimrule load

esxcli storage core claimrule run

This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.

Let me know if you need further help.

View solution in original post

19 Replies
UmeshAhuja
Commander
Commander
Jump to solution

Hi,

It seems like your device is active on the single path... Please cross check.

The field for the device is specifically reserved for indicating the path status of the device.

--> When the device has more than one path to the target( storage array), the path status is "on".

--> When all the paths to the target are down ( either off / dead ) the device status is "dead".

--> If  there is only one path to the target then the status is "degraded".

--> When a device is  unmapped from the storage array while ESX was using the device the status of the device is "not connected".

--> If ESX fails to recognize the state of the device ( if all above mentioned scenarios are not applicable) then the device status is "unknown".

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.
quantam
Enthusiast
Enthusiast
Jump to solution

Dear Umesh,

Yes I do understand that. But how do I fix it. That's the question. This is Scsi disk (builtin drives) to the Server.


We do have an old setup similar to the one we are installing now, but it doesn't show me a degraded status.

Please advise.

Reply
0 Kudos
UmeshAhuja
Commander
Commander
Jump to solution

Hi,

Can you post the screenshot of both the storage from both servers

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.
Reply
0 Kudos
quantam
Enthusiast
Enthusiast
Jump to solution

OLD:

Old.JPG

New:

New.JPG

Reply
0 Kudos
UmeshAhuja
Commander
Commander
Jump to solution

Hi,

Thanks for posting screenshot but I mean can you post the screenshot of Path Status of both the servers

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.
Reply
0 Kudos
quantam
Enthusiast
Enthusiast
Jump to solution

Hi Umesh,


Here you go:

OLD:

Path_status old.JPG

New:

Path_status new.JPG

Here you can see the preferred in OLD has a * where as in NEW it doesn't. It tried to right-click the paths and say preferred but it doesn't have an effect. I rescanned the disk and tried again. But it doesn't want to do it.

I also came across this link:

VMware vSphere 6 on Dell EMC PowerEdge Servers Release Notes

This link is describing it as a known problem in update 3. My Old box is running update 2 and my new box is running update 3.

Please advise.

Reply
0 Kudos
quantam
Enthusiast
Enthusiast
Jump to solution

Below is the updated picture of the NEW server, in the above picture I was trying to change the Path Selection to see if it helped.

NEW:

Path_status new.JPG

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

let me know if you  installed esxi 6.0 U3  HP Customized ESXi Image?

Regards,

Randhir

Reply
0 Kudos
quantam
Enthusiast
Enthusiast
Jump to solution

Yes, that's the one I did. Why anything wrong with it ?

Reply
0 Kudos
UmeshAhuja
Commander
Commander
Jump to solution

Hi,

Its seems to be problem with the customize image (6.0 U3  HP Customized ESXi Image)

I have checked many HP sites and google but there is no excast or proper evidence mentioning about such incidents. But mainly most of the comments in varies site state that it is issue with the customize image.

Can you try and check another image like 6.0 U2  HP Customized ESXi Image

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.
Reply
0 Kudos
quantam
Enthusiast
Enthusiast
Jump to solution

That would not be possible as the servers are in production or ready to go within short span of time. Anyway I have raised my concerns to both HP and VMWARE support. I will post back here if I have a solution.


Thanks.

Reply
0 Kudos
Romeroal
Contributor
Contributor
Jump to solution

Hi,

We have the same problem with the customized image for Cisco (6.0 U3 Cisco Customized ESXi Image) in four ESXi hosts with a recent clean installation.

We are going to open a VMWare support Case in order to check this warning. If I receive news, I'll post here.

Reply
0 Kudos
Dave_the_Wave
Hot Shot
Hot Shot
Jump to solution

Not sure if this answer will help a G9, but any time I got storage issues with ProLiants (G2 to G6), I would check all HPE hardware logs (like the ones in the iLO), and the problem was always a bad drive (a drive can read/write and still be "bad"), bad smart controller (especially the onboard ones), even bad battery cache.

ProLiant Essentials on a windows box will give accurate feedback you can trust, I don't have enough experience with HPE ESXi to believe what it does or doesn't tell me.

Reply
0 Kudos
MBreidenbach0
Hot Shot
Hot Shot
Jump to solution

If you used the HPE ESXi image you can check RAID status via CLI. The exact name and path of the ssacli tool is dependent on the ESXi version; for example it may be called hpssacli and the directory may be different. The following commands will show problems like defective cache battery:

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show status

Smart Array P410i in Slot 0 (Embedded)

   Controller Status: OK

   Cache Status: OK

   Battery/Capacitor Status: OK

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show config detail

Smart Array P410i in Slot 0 (Embedded)

   Bus Interface: PCI

   Slot: 0

   Serial Number: 50014380175866C0

   Cache Serial Number: PBCDF0CRH1I2U3

   Controller Status: OK

   Hardware Revision: C

   Firmware Version: 6.64-0

   Rebuild Priority: Medium

   Expand Priority: Medium

   Surface Scan Delay: 3 secs

   Surface Scan Mode: Idle

   Parallel Surface Scan Supported: No

   Queue Depth: Automatic

   Monitor and Performance Delay: 60  min

   Elevator Sort: Enabled

   Degraded Performance Optimization: Disabled

   Inconsistency Repair Policy: Disabled

   Wait for Cache Room: Disabled

   Surface Analysis Inconsistency Notification: Disabled

   Post Prompt Timeout: 15 secs

   Cache Board Present: True

   Cache Status: OK

   Cache Ratio: 25% Read / 75% Write

   Drive Write Cache: Disabled

   Total Cache Size: 1024 MB

   Total Cache Memory Available: 912 MB

   No-Battery Write Cache: Disabled

   Cache Backup Power Source: Capacitors

   Battery/Capacitor Count: 1

   Battery/Capacitor Status: OK

   SATA NCQ Supported: True

   Number of Ports: 2 Internal only

   Driver Name: HPE HPSA

   Driver Version: 6.0.0.124-1OEM

   PCI Address (Domain:Bus:Device.Function): 0000:05:00.0

   Port Max Phy Rate Limiting Supported: False

   Host Serial Number: CZ214503R3

   Sanitize Erase Supported: False

   Primary Boot Volume: logicaldrive 1 (600508B1001C41E272242D6A37FC83B0)

   Secondary Boot Volume: None

   Port Name: 1I

         Port ID: 0

         Port Connection Number: 0

         SAS Address: 50014380175866C0

         Port Location: Internal

   Port Name: 2I

         Port ID: 1

         Port Connection Number: 1

         SAS Address: 50014380175866C4

         Port Location: Internal

   Internal Drive Cage at Port 1I, Box 1, OK

      Power Supply Status: Not Redundant

      Drive Bays: 4

      Port: 1I

      Box: 1

      Location: Internal

   Physical Drives

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 450 GB, OK)

      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 450 GB, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 450 GB, OK)

      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 450 GB, OK)

   Internal Drive Cage at Port 2I, Box 1, OK

      Power Supply Status: Not Redundant

      Drive Bays: 4

      Port: 2I

      Box: 1

      Location: Internal

   Physical Drives

      None attached

   Array: A

      Interface Type: SAS

      Unused Space: 0  MB (0.0%)

      Used Space: 1.6 TB (100.0%)

      Status: OK

      Array Type: Data

      Logical Drive: 1

         Size: 1.2 TB

         Fault Tolerance: 5

         Heads: 255

         Sectors Per Track: 32

         Cylinders: 65535

         Strip Size: 256 KB

         Full Stripe Size: 768 KB

         Status: OK

         Caching:  Enabled

         Parity Initialization Status: Initialization Completed

         Unique Identifier: 600508B1001C41E272242D6A37FC83B0

         Boot Volume: primary

         Logical Drive Label: ADC1EA4550014380175866C0CBA7

         Drive Type: Data

         LD Acceleration Method: Controller Cache

      physicaldrive 1I:1:1

         Port: 1I

         Box: 1

         Bay: 1

         Status: OK

         Drive Type: Data Drive

         Interface Type: SAS

         Size: 450 GB

         Drive exposed to OS: False

         Logical/Physical Block Size: 512/512

         Rotational Speed: 10000

         Firmware Revision: HPD0

         Serial Number: 53R0A038FTM71322

         WWID: 50000394C831AC96

         Model: HP      EG0450FCSPK

         Current Temperature (C): 33

         Maximum Temperature (C): 48

         PHY Count: 2

         PHY Transfer Rate: 6.0Gbps, Unknown

         Sanitize Erase Supported: False

         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:2

         Port: 1I

         Box: 1

         Bay: 2

         Status: OK

         Drive Type: Data Drive

         Interface Type: SAS

         Size: 450 GB

         Drive exposed to OS: False

         Logical/Physical Block Size: 512/512

         Rotational Speed: 10000

         Firmware Revision: HPD0

         Serial Number: 6330A02JFTM71323

         WWID: 50000394D8020F7A

         Model: HP      EG0450FCSPK

         Current Temperature (C): 34

         Maximum Temperature (C): 48

         PHY Count: 2

         PHY Transfer Rate: 6.0Gbps, Unknown

         Sanitize Erase Supported: False

         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:3

         Port: 1I

         Box: 1

         Bay: 3

         Status: OK

         Drive Type: Data Drive

         Interface Type: SAS

         Size: 450 GB

         Drive exposed to OS: False

         Logical/Physical Block Size: 512/512

         Rotational Speed: 10000

         Firmware Revision: HPD0

         Serial Number: 53V0A0CIFTM71322

         WWID: 50000394C83A42FE

         Model: HP      EG0450FCSPK

         Current Temperature (C): 33

         Maximum Temperature (C): 52

         PHY Count: 2

         PHY Transfer Rate: 6.0Gbps, Unknown

         Sanitize Erase Supported: False

         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:4

         Port: 1I

         Box: 1

         Bay: 4

         Status: OK

         Drive Type: Data Drive

         Interface Type: SAS

         Size: 450 GB

         Drive exposed to OS: False

         Logical/Physical Block Size: 512/512

         Rotational Speed: 10000

         Firmware Revision: HPD0

         Serial Number: 53R0A014FTM71322

         WWID: 50000394C8313D2E

         Model: HP      EG0450FCSPK

         Current Temperature (C): 36

         Maximum Temperature (C): 52

         PHY Count: 2

         PHY Transfer Rate: 6.0Gbps, Unknown

         Sanitize Erase Supported: False

         Shingled Magnetic Recording Support: None

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250

      Device Number: 250

      Firmware Version: RevC

      WWID: 50014380175866CF

      Vendor ID: PMCSIERA

      Model: SRC 8x6G

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin]

quantam
Enthusiast
Enthusiast
Jump to solution

We had opened a case with both HP and VMWARE. HP replied as follows:

This field for the device is specifically reserved for indicating the path status of the device. If  there is only one path to the target then the status is "degraded".

The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.

This does not have any effect on the functioning of the LUN.

VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local"     (replace the diskid with the disk identifier)

esxcli storage core claiming reclaim -d diskid   (replace the diskid with the disk identifier)

esxcli storage core claimrule load

esxcli storage core claimrule run

This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.

Let me know if you need further help.

quantam
Enthusiast
Enthusiast
Jump to solution

All my logs and the ILO shows up as normal. No amber lights on a physical inspection. So the disks are all good, it was something to do with ESXi not recognizing the disk as local. So you need to tag it. Check my detailed answer for the procedure I used to sort it out.

Reply
0 Kudos
sistemi_step
Contributor
Contributor
Jump to solution

Thanks a lot, but before try... can I execute commands on online production environment? Is there any data corruption o denial of service risks?

Reply
0 Kudos
kgreig
Contributor
Contributor
Jump to solution

From my experience I would say, proceed with caution!  While this procedure resolved the degraded disk state (following a quick refresh), the presented LUN then disappeared from the Datastore tab and could not be added back!!  I had to re-create the underlying storage Pool and reset the associated iSCSI target, before reconnecting the storage pool back to ESXi...

Although this whole process wipes the VMFS volume, once it's re-connected the datastore does at least maintain a healthy status of "Normal". So just be aware to take steps to backup any data beforehand!

Reply
0 Kudos
josegiron
Contributor
Contributor
Jump to solution

Perfecto, lo hice por comandos y funcionó..

 

GRacias

Reply
0 Kudos