Solved: Disks degraded in ESXi

quantam · ‎01-09-2018

I have a ESXI v6 u3 installed on a DL380 G9. I have done Raid from the HP provisioning itself. After installing the ESXI and presenting the disks to it. The disks show me as degraded:

naa.600508b1001ce2641dcf3c827c33d0df

Display Name: HP Serial Attached SCSI Disk (naa.600508b1001ce2641dcf3c827c33d 0df)

Has Settable Display Name: true

Size: 1144609

Device Type: Direct-Access

Multipath Plugin: NMP

Devfs Path: /vmfs/devices/disks/naa.600508b1001ce2641dcf3c827c33d0df

Vendor: HP

Model: LOGICAL VOLUME

Revision: 5.04

SCSI Level: 5

Is Pseudo: false

Status: degraded

Is RDM Capable: true

Is Local: false

Is Removable: false

Is SSD: false

Is VVOL PE: false

Is Offline: false

Is Perennially Reserved: false

Queue Full Sample Size: 0

Queue Full Threshold: 0

Thin Provisioning Status: unknown

Attached Filters:

VAAI Status: unknown

Other UIDs: vml.0200010000600508b1001ce2641dcf3c827c33d0df4c4f47494341

Is Shared Clusterwide: true

Is Local SAS Device: false

Is SAS: true

Is USB: false

Is Boot USB Device: false

Is Boot Device: true

Device Max Queue Depth: 1024

No of outstanding IOs with competing worlds: 32

Drive Type: logical

RAID Level: RAID1

Number of Physical Drives: 2

Protection Enabled: false

PI Activated: false

PI Type: 0

PI Protection Mask: NO PROTECTION

Supported Guard Types: NO GUARD SUPPORT

DIX Enabled: false

DIX Guard Type: NO GUARD SUPPORT

Emulated DIX/DIF Enabled: false

May I know the reason for this and how to fix it.

quantam · ‎01-19-2018

We had opened a case with both HP and VMWARE. HP replied as follows:

This field for the device is specifically reserved for indicating the path status of the device. If there is only one path to the target then the status is "degraded".

The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.

This does not have any effect on the functioning of the LUN.

VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local" (replace the diskid with the disk identifier)

esxcli storage core claiming reclaim -d diskid (replace the diskid with the disk identifier)

esxcli storage core claimrule load

esxcli storage core claimrule run

This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.

Let me know if you need further help.

View solution in original post

UmeshAhuja · ‎01-09-2018

Hi,

It seems like your device is active on the single path... Please cross check.

The field for the device is specifically reserved for indicating the path status of the device.

--> When the device has more than one path to the target( storage array), the path status is "on".

--> When all the paths to the target are down ( either off / dead ) the device status is "dead".

--> If there is only one path to the target then the status is "degraded".

--> When a device is unmapped from the storage array while ESX was using the device the status of the device is "not connected".

--> If ESX fails to recognize the state of the device ( if all above mentioned scenarios are not applicable) then the device status is "unknown".

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.

quantam · ‎01-09-2018

Dear Umesh,

Yes I do understand that. But how do I fix it. That's the question. This is Scsi disk (builtin drives) to the Server.

We do have an old setup similar to the one we are installing now, but it doesn't show me a degraded status.

Please advise.

UmeshAhuja · ‎01-09-2018

Hi,

Can you post the screenshot of both the storage from both servers

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.

quantam · ‎01-09-2018

OLD:

New:

UmeshAhuja · ‎01-09-2018

Hi,

Thanks for posting screenshot but I mean can you post the screenshot of Path Status of both the servers

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.

quantam · ‎01-09-2018

Hi Umesh,

Here you go:

OLD:

New:

Here you can see the preferred in OLD has a * where as in NEW it doesn't. It tried to right-click the paths and say preferred but it doesn't have an effect. I rescanned the disk and tried again. But it doesn't want to do it.

I also came across this link:

VMware vSphere 6 on Dell EMC PowerEdge Servers Release Notes

This link is describing it as a known problem in update 3. My Old box is running update 2 and my new box is running update 3.

Please advise.

quantam · ‎01-09-2018

Below is the updated picture of the NEW server, in the above picture I was trying to change the Path Selection to see if it helped.

NEW:

admin · ‎01-09-2018

let me know if you installed esxi 6.0 U3 HP Customized ESXi Image?

Regards,

Randhir

quantam · ‎01-10-2018

Yes, that's the one I did. Why anything wrong with it ?

UmeshAhuja · ‎01-10-2018

Hi,

Its seems to be problem with the customize image (6.0 U3 HP Customized ESXi Image)

I have checked many HP sites and google but there is no excast or proper evidence mentioning about such incidents. But mainly most of the comments in varies site state that it is issue with the customize image.

Can you try and check another image like 6.0 U2 HP Customized ESXi Image

Thanks n Regards
Umesh Ahuja

If your query resolved then please consider awarding points by correct or helpful marking.

quantam · ‎01-10-2018

That would not be possible as the servers are in production or ready to go within short span of time. Anyway I have raised my concerns to both HP and VMWARE support. I will post back here if I have a solution.

Thanks.

Romeroal · ‎01-15-2018

Hi,

We have the same problem with the customized image for Cisco (6.0 U3 Cisco Customized ESXi Image) in four ESXi hosts with a recent clean installation.

We are going to open a VMWare support Case in order to check this warning. If I receive news, I'll post here.

Dave_the_Wave · ‎01-15-2018

Not sure if this answer will help a G9, but any time I got storage issues with ProLiants (G2 to G6), I would check all HPE hardware logs (like the ones in the iLO), and the problem was always a bad drive (a drive can read/write and still be "bad"), bad smart controller (especially the onboard ones), even bad battery cache.

ProLiant Essentials on a windows box will give accurate feedback you can trust, I don't have enough experience with HPE ESXi to believe what it does or doesn't tell me.

MBreidenbach0 · ‎01-16-2018

If you used the HPE ESXi image you can check RAID status via CLI. The exact name and path of the ssacli tool is dependent on the ESXi version; for example it may be called hpssacli and the directory may be different. The following commands will show problems like defective cache battery:

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show status

Smart Array P410i in Slot 0 (Embedded)

Controller Status: OK

Cache Status: OK

Battery/Capacitor Status: OK

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show config detail

Smart Array P410i in Slot 0 (Embedded)

Bus Interface: PCI

Slot: 0

Serial Number: 50014380175866C0

Cache Serial Number: PBCDF0CRH1I2U3

Controller Status: OK

Hardware Revision: C

Firmware Version: 6.64-0

Rebuild Priority: Medium

Expand Priority: Medium

Surface Scan Delay: 3 secs

Surface Scan Mode: Idle

Parallel Surface Scan Supported: No

Queue Depth: Automatic

Monitor and Performance Delay: 60 min

Elevator Sort: Enabled

Degraded Performance Optimization: Disabled

Inconsistency Repair Policy: Disabled

Wait for Cache Room: Disabled

Surface Analysis Inconsistency Notification: Disabled

Post Prompt Timeout: 15 secs

Cache Board Present: True

Cache Status: OK

Cache Ratio: 25% Read / 75% Write

Drive Write Cache: Disabled

Total Cache Size: 1024 MB

Total Cache Memory Available: 912 MB

No-Battery Write Cache: Disabled

Cache Backup Power Source: Capacitors

Battery/Capacitor Count: 1

Battery/Capacitor Status: OK

SATA NCQ Supported: True

Number of Ports: 2 Internal only

Driver Name: HPE HPSA

Driver Version: 6.0.0.124-1OEM

PCI Address (Domain:Bus:Device.Function): 0000:05:00.0

Port Max Phy Rate Limiting Supported: False

Host Serial Number: CZ214503R3

Sanitize Erase Supported: False

Primary Boot Volume: logicaldrive 1 (600508B1001C41E272242D6A37FC83B0)

Secondary Boot Volume: None

Port Name: 1I

Port ID: 0

Port Connection Number: 0

SAS Address: 50014380175866C0

Port Location: Internal

Port Name: 2I

Port ID: 1

Port Connection Number: 1

SAS Address: 50014380175866C4

Port Location: Internal

Internal Drive Cage at Port 1I, Box 1, OK

Power Supply Status: Not Redundant

Drive Bays: 4

Port: 1I

Box: 1

Location: Internal

Physical Drives

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 450 GB, OK)

physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 450 GB, OK)

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 450 GB, OK)

physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 450 GB, OK)

Internal Drive Cage at Port 2I, Box 1, OK

Power Supply Status: Not Redundant

Drive Bays: 4

Port: 2I

Box: 1

Location: Internal

Physical Drives

None attached

Array: A

Interface Type: SAS

Unused Space: 0 MB (0.0%)

Used Space: 1.6 TB (100.0%)

Status: OK

Array Type: Data

Logical Drive: 1

Size: 1.2 TB

Fault Tolerance: 5

Heads: 255

Sectors Per Track: 32

Cylinders: 65535

Strip Size: 256 KB

Full Stripe Size: 768 KB

Status: OK

Caching: Enabled

Parity Initialization Status: Initialization Completed

Unique Identifier: 600508B1001C41E272242D6A37FC83B0

Boot Volume: primary

Logical Drive Label: ADC1EA4550014380175866C0CBA7

Drive Type: Data

LD Acceleration Method: Controller Cache

physicaldrive 1I:1:1

Port: 1I

Box: 1

Bay: 1

Status: OK

Drive Type: Data Drive

Interface Type: SAS

Size: 450 GB

Drive exposed to OS: False

Logical/Physical Block Size: 512/512

Rotational Speed: 10000

Firmware Revision: HPD0

Serial Number: 53R0A038FTM71322

WWID: 50000394C831AC96

Model: HP EG0450FCSPK

Current Temperature (C): 33

Maximum Temperature (C): 48

PHY Count: 2

PHY Transfer Rate: 6.0Gbps, Unknown

Sanitize Erase Supported: False

Shingled Magnetic Recording Support: None

physicaldrive 1I:1:2

Port: 1I

Box: 1

Bay: 2

Status: OK

Drive Type: Data Drive

Interface Type: SAS

Size: 450 GB

Drive exposed to OS: False

Logical/Physical Block Size: 512/512

Rotational Speed: 10000

Firmware Revision: HPD0

Serial Number: 6330A02JFTM71323

WWID: 50000394D8020F7A

Model: HP EG0450FCSPK

Current Temperature (C): 34

Maximum Temperature (C): 48

PHY Count: 2

PHY Transfer Rate: 6.0Gbps, Unknown

Sanitize Erase Supported: False

Shingled Magnetic Recording Support: None

physicaldrive 1I:1:3

Port: 1I

Box: 1

Bay: 3

Status: OK

Drive Type: Data Drive

Interface Type: SAS

Size: 450 GB

Drive exposed to OS: False

Logical/Physical Block Size: 512/512

Rotational Speed: 10000

Firmware Revision: HPD0

Serial Number: 53V0A0CIFTM71322

WWID: 50000394C83A42FE

Model: HP EG0450FCSPK

Current Temperature (C): 33

Maximum Temperature (C): 52

PHY Count: 2

PHY Transfer Rate: 6.0Gbps, Unknown

Sanitize Erase Supported: False

Shingled Magnetic Recording Support: None

physicaldrive 1I:1:4

Port: 1I

Box: 1

Bay: 4

Status: OK

Drive Type: Data Drive

Interface Type: SAS

Size: 450 GB

Drive exposed to OS: False

Logical/Physical Block Size: 512/512

Rotational Speed: 10000

Firmware Revision: HPD0

Serial Number: 53R0A014FTM71322

WWID: 50000394C8313D2E

Model: HP EG0450FCSPK

Current Temperature (C): 36

Maximum Temperature (C): 52

PHY Count: 2

PHY Transfer Rate: 6.0Gbps, Unknown

Sanitize Erase Supported: False

Shingled Magnetic Recording Support: None

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250

Device Number: 250

Firmware Version: RevC

WWID: 50014380175866CF

Vendor ID: PMCSIERA

Model: SRC 8x6G

[root@dl380g702:/opt/smartstorageadmin/ssacli/bin]

quantam · ‎01-19-2018

We had opened a case with both HP and VMWARE. HP replied as follows:

This field for the device is specifically reserved for indicating the path status of the device. If there is only one path to the target then the status is "degraded".

The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.

This does not have any effect on the functioning of the LUN.

VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local" (replace the diskid with the disk identifier)

esxcli storage core claiming reclaim -d diskid (replace the diskid with the disk identifier)

esxcli storage core claimrule load

esxcli storage core claimrule run

This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.

Let me know if you need further help.

quantam · ‎01-19-2018

All my logs and the ILO shows up as normal. No amber lights on a physical inspection. So the disks are all good, it was something to do with ESXi not recognizing the disk as local. So you need to tag it. Check my detailed answer for the procedure I used to sort it out.

sistemi_step · ‎08-11-2020

Thanks a lot, but before try... can I execute commands on online production environment? Is there any data corruption o denial of service risks?

kgreig · ‎12-18-2020

From my experience I would say, proceed with caution! While this procedure resolved the degraded disk state (following a quick refresh), the presented LUN then disappeared from the Datastore tab and could not be added back!! I had to re-create the underlying storage Pool and reset the associated iSCSI target, before reconnecting the storage pool back to ESXi...

Although this whole process wipes the VMFS volume, once it's re-connected the datastore does at least maintain a healthy status of "Normal". So just be aware to take steps to backup any data beforehand!

josegiron · ‎06-18-2021

Perfecto, lo hice por comandos y funcionó..

GRacias