Hi,
it first slow downs and then no Server running on the Datastore is accessible. The Clone-Tasks on the Datastore run against timeouts. The VMs cant be restarted or shutdown. The ESX does not respond and the Datatore can't be browsed. The Only way is to Reset\ Restart the Host via ILO. There are no Errors whatsover in ILO, vCenter or ESX Monitoring. The ISCSI Datastores are still accessible.
We have reinstalled the ESX and reconfigured the RAIDS. The Machines run on the Backup Dell Maschine with same ESX Version without any Problems.
There is just one Datastore 2 TB in size. There are three VMs including vCenter running on the Host. The other two are the DC and AppServer. The Appserver is with normal load. We had no problems before. The Problem appeared when we moved from 6.5 to some 6.7 Update.
Attached you will find the HDD Performance Stats (in German) which maybe points to the storage driver. Also attached is the firmware info.
Since we have tried everything possible we narrow down our problem to Storage.
Here are the further specs:
Is anybody having similar issues?
I thankyou in advance for any help to resolve this issue.
Kind Regards
Sardar
Hello.
Usually the slow access to the internal storage depends on the FIrmware levels of the internal disk controller and your disks. It also influences the driver version.
In your case we show an HP Smart Array
P408i-a SR Gen10 with Firmware 3.53
The disks have HPD3 firmware, you need to know the P/n and model of the disks to check for new firmware levels.
As you are using version 7.0 Update 2 you should try the latest levels available:
Firmware 4.11 and Driver smartpqi version 70.4150.0.119.
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=43704
Firmware link:
https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX-f39382f2be4d450e987c26819e
Hello,
Thank you for your reply.
I will go through the specifics you mentioned and post back.
Kind Regards
Khan
Hi,
we installed the newest available SPP 2021.10.0 for the pProliant Server.
With that now we have the latest Firmware 4.11. This didn't change the situation.
I now also updated the smartpqi to latest version. Attached you will find the newest version.
I will report if this solves my problem. It takes almost a day or two to conclude.
P.S.: I updated the correct Firmware image of HP.
Kind Regards
Sardar
Hello.
VMware driver in link:
Hello.
Have you updated the firmware of the disks?
Does the controller have cache and battery?
What are the read write cache values?
Do you have HPE Smart Storage Administrator (HPE SSA) CLI for VMware 7.0 installed?
attached link:
If you installed VMware vSphere with the HPE custom image then you already have the right utility.
it should be located in a subdirectory like these
/opt/smartstorageadmin/ssacli/bin/ssacli
/opt/hp/hpsssacli/bin/hpssacli
Hi Enrique,
no i have not updated the firmware of the Disks. To be honest i dont know how to do that. Can i do that from ILO?
I have not seen it physically but as far as ILO shows, it is supposed to have cache and Battery. I have attached some screenshots.
We do have installed the custom HPE image. Where and how should i look for the Tools and subdirectories?
I am not an advanced user, which u can already guess. I would be thankful for any further help.
P.S.: Is there anywhere in logs to see why the ESX behaves as shown in image "DISK-IO.jpg"
Kind Regards
Sardar
Hello.
According to what you have sent us:
Disk model EG001200JWJNQ with HPD4 firmware is at its latest level (2021), originally the disks had HPD3 firmware, they were upgraded by applying the latest SPP 2021.
Controller FW version 4.11, this is ok.
If you installed the custom image of HPE VMware ESXi 7.0 Build-17867351 Update 2 this is from May 2021, this image includes HPE Agentless Management Bundle for ESXi 7.0.
HPE has reported problems with the ASM (Agentless Management Service) in versions 6.7 and 7.0 and recommends installing the latest version (Nov 2021).
Attached is a link to the latest version
https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_ef3c7b0fd13e4ee486e0263676#tab-history
follow the instructions to install:
1 Power off any virtual machines that are running on the host and place the host into maintenance mode.
2. login to the ESXi host with an SSH session (you must enable SSH access), the user must be root.
3. copy file xx.zip to an internal ESXI volume
4. run the command
# esxcli software component apply -d <ESXi local path><component.zip> command
5. After the component is installed, reboot the ESXi host for the updates to take effect.
6. Log in to the ESXi host and take it out of maintenance mode and verify its operation.
Since you have the custom HPE image, the utility to manage and monitor the smart Array controller should be in one of these directories.
/opt/smartstorageadmin/ssacli/bin/ssacli
/opt/hp/hpsssacli/bin/hpssacli
In this link you will find more information about HPE SSACLI and its commands.
https://be-virtual.net/hpe-storage-controller-management-ssacli/
https://kb.gtkc.net/hp-smart-array-cli-commands/
Execute the commands and attach your results in this post
Show config
Show detail
Show config detail
Show status
Hi,
thanks alot for the instructions. I will give a shot this week and post back the results.
Kind Regards
Khan
Hi,
have you ever checked the logs if they would contain any hints or error messages?
Based on my personal experience driver or firmware issues might cause such problems, but most of the time they're caused by other reasons.
The vmkernel, vmkwarning or vobd logs should contain messages when the IOs got stucked.
Just my 2 cents.
Ralf
Hi,
Appoligies for the delay. Here are the results (you will also find the results as an attachment):
[root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded) (sn: PEYHC0DRHCXC65)
Internal Drive Cage at Port 1I, Box 3, OK
Internal Drive Cage at Port 2I, Box 0, OK
Port Name: 1I (Mixed)
Port Name: 2I (Mixed)
Array A (SAS, Unused Space: 1 MB)
logicaldrive 1 (2.18 TB, RAID 1+0, OK)
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)
SEP (Vendor ID HPE, Model Smart Adapter) 379 (WWID: 51402EC0144D1C28)
[root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: PEYHC0DRHCXC65
RAID 6 Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 4.11
Firmware Supports Online Firmware Activation: True
Driver Supports Online Firmware Activation: False
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Write Cache Bypass Threshold Size: 1040 KiB
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Configured Drive Write Cache Policy: Default
Unconfigured Drive Write Cache Policy: Default
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
Battery Backed Cache Size: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 68
Capacitor Temperature (C): 57
Number of Ports: 2 Internal only
Encryption: Not Set
Express Local Encryption: False
Driver Name: smartpqi
Driver Version: VMware 70.4150.0.119
PCI Address (Domain🚌Device.Function): 0000:65:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: Mixed
Port Max Phy Rate Limiting Supported: False
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: CZJ94704DV
Sanitize Erase Supported: True
Sanitize Lock: None
Sensor ID: 0
Location: Capacitor
Current Value (C): 57
Max Value Since Power On: 59
Sensor ID: 1
Location: ASIC
Current Value (C): 68
Max Value Since Power On: 70
Sensor ID: 2
Location: Unknown
Current Value (C): 53
Max Value Since Power On: 56
Primary Boot Volume: None
Secondary Boot Volume: None
[root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config detail
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: PEYHC0DRHCXC65
RAID 6 Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 4.11
Firmware Supports Online Firmware Activation: True
Driver Supports Online Firmware Activation: False
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Write Cache Bypass Threshold Size: 1040 KiB
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Configured Drive Write Cache Policy: Default
Unconfigured Drive Write Cache Policy: Default
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
Battery Backed Cache Size: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 67
Capacitor Temperature (C): 57
Number of Ports: 2 Internal only
Encryption: Not Set
Express Local Encryption: False
Driver Name: smartpqi
Driver Version: VMware 70.4150.0.119
PCI Address (Domain:Bus:Device.Function): 0000:65:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: Mixed
Port Max Phy Rate Limiting Supported: False
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: CZJ94704DV
Sanitize Erase Supported: True
Sanitize Lock: None
Sensor ID: 0
Location: Capacitor
Current Value (C): 57
Max Value Since Power On: 59
Sensor ID: 1
Location: ASIC
Current Value (C): 67
Max Value Since Power On: 70
Sensor ID: 2
Location: Unknown
Current Value (C): 52
Max Value Since Power On: 56
Primary Boot Volume: None
Secondary Boot Volume: None
Internal Drive Cage at Port 1I, Box 3, OK
Drive Bays: 4
Port: 1I
Box: 3
Location: Internal
Physical Drives
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)
Internal Drive Cage at Port 2I, Box 0, OK
Drive Bays: 4
Port: 2I
Box: 0
Location: Internal
Physical Drives
None attached
Port Name: 1I
Port ID: 0
Port Mode: Mixed
Port Connection Number: 0
SAS Address: 51402EC0144D1C20
Port Location: Internal
Port Name: 2I
Port ID: 1
Port Mode: Mixed
Port Connection Number: 1
SAS Address: 51402EC0144D1C24
Port Location: Internal
Array: A
Interface Type: SAS
Unused Space: 1 MB (0.00%)
Used Space: 4.37 TB (100.00%)
Status: OK
MultiDomain Status: OK
Array Type: Data
Smart Path: disable
Logical Drive: 1
Size: 2.18 TB
Fault Tolerance: 1+0
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 512 KB
Status: OK
Unrecoverable Media Errors: None
MultiDomain Status: OK
Caching: Enabled
Unique Identifier: 600508B1001C35C6E76289131F3BF536
Logical Drive Label: Logical Drive 1
Mirror Group 1:
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
Mirror Group 2:
physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS HDD, 1.2 TB, OK)
Drive Type: Data
LD Acceleration Method: Controller Cache
physicaldrive 1I:3:1
Port: 1I
Box: 3
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 1.2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10500
Firmware Revision: HPD4
Serial Number: WFK54QCF
WWID: 5000C5009A62B229
Model: HPE EG001200JWJNQ
Current Temperature (C): 53
Maximum Temperature (C): 56
PHY Count: 2
PHY Transfer Rate: 12.0Gbps, Unknown
PHY Physical Link Rate: 12.0Gbps, Unknown
PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Drive Unique ID: 5000C5009A62B22B
Self Encrypting Drive: False
physicaldrive 1I:3:2
Port: 1I
Box: 3
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 1.2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10500
Firmware Revision: HPD4
Serial Number: WFK58NSL
WWID: 5000C5009A5A6399
Model: HPE EG001200JWJNQ
Current Temperature (C): 58
Maximum Temperature (C): 61
PHY Count: 2
PHY Transfer Rate: 12.0Gbps, Unknown
PHY Physical Link Rate: 12.0Gbps, Unknown
PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Drive Unique ID: 5000C5009A5A639B
Self Encrypting Drive: False
physicaldrive 1I:3:3
Port: 1I
Box: 3
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 1.2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10500
Firmware Revision: HPD4
Serial Number: WFK54QE4
WWID: 5000C5009A62B001
Model: HPE EG001200JWJNQ
Current Temperature (C): 57
Maximum Temperature (C): 59
PHY Count: 2
PHY Transfer Rate: 12.0Gbps, Unknown
PHY Physical Link Rate: 12.0Gbps, Unknown
PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Drive Unique ID: 5000C5009A62B003
Self Encrypting Drive: False
physicaldrive 1I:3:4
Port: 1I
Box: 3
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 1.2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10500
Firmware Revision: HPD4
Serial Number: WFK54QEN
WWID: 5000C5009A62AFA1
Model: HPE EG001200JWJNQ
Current Temperature (C): 51
Maximum Temperature (C): 55
PHY Count: 2
PHY Transfer Rate: 12.0Gbps, Unknown
PHY Physical Link Rate: 12.0Gbps, Unknown
PHY Maximum Link Rate: 12.0Gbps, 12.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Sanitize Estimated Max Erase Time: 1 hour(s), 55 minute(s)
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Drive Unique ID: 5000C5009A62AFA3
Self Encrypting Drive: False
SEP (Vendor ID HPE, Model Smart Adapter) 379
Device Number: 379
Firmware Version: 4.11
WWID: 51402EC0144D1C28
Vendor ID: HPE
Model: Smart Adapter
[root@ESX7:~] /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK
Hi Ralf,
thx for the Tipp. I have checked the logs but not intense. Maybe its time to do that. Do you know maybe what (or wicht Logfile) should i look for in particular?
Kind Regards
Khan
Hi Enrique,
i have uploaded the logs. Is there anything suspicious.
Kind Regards
Sardar
Hi Sardar,
looks like you still have an issue here.
I created a list of questions which should allow us to get a better picture of the current situation and also some recommendations what should be checked now.