I have a ESXI v6 u3 installed on a DL380 G9. I have done Raid from the HP provisioning itself. After installing the ESXI and presenting the disks to it. The disks show me as degraded:
naa.600508b1001ce2641dcf3c827c33d0df
Display Name: HP Serial Attached SCSI Disk (naa.600508b1001ce2641dcf3c827c33d 0df)
Has Settable Display Name: true
Size: 1144609
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/naa.600508b1001ce2641dcf3c827c33d0df
Vendor: HP
Model: LOGICAL VOLUME
Revision: 5.04
SCSI Level: 5
Is Pseudo: false
Status: degraded
Is RDM Capable: true
Is Local: false
Is Removable: false
Is SSD: false
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.0200010000600508b1001ce2641dcf3c827c33d0df4c4f47494341
Is Shared Clusterwide: true
Is Local SAS Device: false
Is SAS: true
Is USB: false
Is Boot USB Device: false
Is Boot Device: true
Device Max Queue Depth: 1024
No of outstanding IOs with competing worlds: 32
Drive Type: logical
RAID Level: RAID1
Number of Physical Drives: 2
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false
May I know the reason for this and how to fix it.
We had opened a case with both HP and VMWARE. HP replied as follows:
This field for the device is specifically reserved for indicating the path status of the device. If there is only one path to the target then the status is "degraded".
The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.
This does not have any effect on the functioning of the LUN.
VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local" (replace the diskid with the disk identifier)
esxcli storage core claiming reclaim -d diskid (replace the diskid with the disk identifier)
esxcli storage core claimrule load
esxcli storage core claimrule run
This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.
Let me know if you need further help.
Hi,
It seems like your device is active on the single path... Please cross check.
The field for the device is specifically reserved for indicating the path status of the device.
--> When the device has more than one path to the target( storage array), the path status is "on".
--> When all the paths to the target are down ( either off / dead ) the device status is "dead".
--> If there is only one path to the target then the status is "degraded".
--> When a device is unmapped from the storage array while ESX was using the device the status of the device is "not connected".
--> If ESX fails to recognize the state of the device ( if all above mentioned scenarios are not applicable) then the device status is "unknown".
Dear Umesh,
Yes I do understand that. But how do I fix it. That's the question. This is Scsi disk (builtin drives) to the Server.
We do have an old setup similar to the one we are installing now, but it doesn't show me a degraded status.
Please advise.
Hi,
Can you post the screenshot of both the storage from both servers
OLD:
New:
Hi,
Thanks for posting screenshot but I mean can you post the screenshot of Path Status of both the servers
Hi Umesh,
Here you go:
OLD:
New:
Here you can see the preferred in OLD has a * where as in NEW it doesn't. It tried to right-click the paths and say preferred but it doesn't have an effect. I rescanned the disk and tried again. But it doesn't want to do it.
I also came across this link:
VMware vSphere 6 on Dell EMC PowerEdge Servers Release Notes
This link is describing it as a known problem in update 3. My Old box is running update 2 and my new box is running update 3.
Please advise.
Below is the updated picture of the NEW server, in the above picture I was trying to change the Path Selection to see if it helped.
NEW:
let me know if you installed esxi 6.0 U3 HP Customized ESXi Image?
Regards,
Randhir
Yes, that's the one I did. Why anything wrong with it ?
Hi,
Its seems to be problem with the customize image (6.0 U3 HP Customized ESXi Image)
I have checked many HP sites and google but there is no excast or proper evidence mentioning about such incidents. But mainly most of the comments in varies site state that it is issue with the customize image.
Can you try and check another image like 6.0 U2 HP Customized ESXi Image
That would not be possible as the servers are in production or ready to go within short span of time. Anyway I have raised my concerns to both HP and VMWARE support. I will post back here if I have a solution.
Thanks.
Hi,
We have the same problem with the customized image for Cisco (6.0 U3 Cisco Customized ESXi Image) in four ESXi hosts with a recent clean installation.
We are going to open a VMWare support Case in order to check this warning. If I receive news, I'll post here.
Not sure if this answer will help a G9, but any time I got storage issues with ProLiants (G2 to G6), I would check all HPE hardware logs (like the ones in the iLO), and the problem was always a bad drive (a drive can read/write and still be "bad"), bad smart controller (especially the onboard ones), even bad battery cache.
ProLiant Essentials on a windows box will give accurate feedback you can trust, I don't have enough experience with HPE ESXi to believe what it does or doesn't tell me.
If you used the HPE ESXi image you can check RAID status via CLI. The exact name and path of the ssacli tool is dependent on the ESXi version; for example it may be called hpssacli and the directory may be different. The following commands will show problems like defective cache battery:
[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show status
Smart Array P410i in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
[root@dl380g702:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl all show config detail
Smart Array P410i in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: 50014380175866C0
Cache Serial Number: PBCDF0CRH1I2U3
Controller Status: OK
Hardware Revision: C
Firmware Version: 6.64-0
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 1024 MB
Total Cache Memory Available: 912 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Number of Ports: 2 Internal only
Driver Name: HPE HPSA
Driver Version: 6.0.0.124-1OEM
PCI Address (Domain:Bus:Device.Function): 0000:05:00.0
Port Max Phy Rate Limiting Supported: False
Host Serial Number: CZ214503R3
Sanitize Erase Supported: False
Primary Boot Volume: logicaldrive 1 (600508B1001C41E272242D6A37FC83B0)
Secondary Boot Volume: None
Port Name: 1I
Port ID: 0
Port Connection Number: 0
SAS Address: 50014380175866C0
Port Location: Internal
Port Name: 2I
Port ID: 1
Port Connection Number: 1
SAS Address: 50014380175866C4
Port Location: Internal
Internal Drive Cage at Port 1I, Box 1, OK
Power Supply Status: Not Redundant
Drive Bays: 4
Port: 1I
Box: 1
Location: Internal
Physical Drives
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 450 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 450 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS HDD, 450 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS HDD, 450 GB, OK)
Internal Drive Cage at Port 2I, Box 1, OK
Power Supply Status: Not Redundant
Drive Bays: 4
Port: 2I
Box: 1
Location: Internal
Physical Drives
None attached
Array: A
Interface Type: SAS
Unused Space: 0 MB (0.0%)
Used Space: 1.6 TB (100.0%)
Status: OK
Array Type: Data
Logical Drive: 1
Size: 1.2 TB
Fault Tolerance: 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 768 KB
Status: OK
Caching: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B1001C41E272242D6A37FC83B0
Boot Volume: primary
Logical Drive Label: ADC1EA4550014380175866C0CBA7
Drive Type: Data
LD Acceleration Method: Controller Cache
physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 450 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPD0
Serial Number: 53R0A038FTM71322
WWID: 50000394C831AC96
Model: HP EG0450FCSPK
Current Temperature (C): 33
Maximum Temperature (C): 48
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 450 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPD0
Serial Number: 6330A02JFTM71323
WWID: 50000394D8020F7A
Model: HP EG0450FCSPK
Current Temperature (C): 34
Maximum Temperature (C): 48
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 450 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPD0
Serial Number: 53V0A0CIFTM71322
WWID: 50000394C83A42FE
Model: HP EG0450FCSPK
Current Temperature (C): 33
Maximum Temperature (C): 52
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 450 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: HPD0
Serial Number: 53R0A014FTM71322
WWID: 50000394C8313D2E
Model: HP EG0450FCSPK
Current Temperature (C): 36
Maximum Temperature (C): 52
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250
Device Number: 250
Firmware Version: RevC
WWID: 50014380175866CF
Vendor ID: PMCSIERA
Model: SRC 8x6G
[root@dl380g702:/opt/smartstorageadmin/ssacli/bin]
We had opened a case with both HP and VMWARE. HP replied as follows:
This field for the device is specifically reserved for indicating the path status of the device. If there is only one path to the target then the status is "degraded".
The host client is showing an extended information in case of New Server as ‘Normal, Degraded’ instead of just ‘Normal’.
This does not have any effect on the functioning of the LUN.
VMWARE said that because this is counted as remote disk so it expects two paths to it, if not found it will put it as degraded. Now to resolve this we need to tag this as local disk using the following procedure through SSH. But if you have vCenter, you can simply right click the device and click on `Mark as Local`. If no vCenter use the following procedure:
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL --device diskid --option="enable_local" (replace the diskid with the disk identifier)
esxcli storage core claiming reclaim -d diskid (replace the diskid with the disk identifier)
esxcli storage core claimrule load
esxcli storage core claimrule run
This has worked for me and all my local disks are showing as Normal now and `Status:ON`. The `Local` field is reading as true.
Let me know if you need further help.
All my logs and the ILO shows up as normal. No amber lights on a physical inspection. So the disks are all good, it was something to do with ESXi not recognizing the disk as local. So you need to tag it. Check my detailed answer for the procedure I used to sort it out.
Thanks a lot, but before try... can I execute commands on online production environment? Is there any data corruption o denial of service risks?
From my experience I would say, proceed with caution! While this procedure resolved the degraded disk state (following a quick refresh), the presented LUN then disappeared from the Datastore tab and could not be added back!! I had to re-create the underlying storage Pool and reset the associated iSCSI target, before reconnecting the storage pool back to ESXi...
Although this whole process wipes the VMFS volume, once it's re-connected the datastore does at least maintain a healthy status of "Normal". So just be aware to take steps to backup any data beforehand!
Perfecto, lo hice por comandos y funcionó..
GRacias