VMware Cloud Community
PLP-Lackner
Contributor
Contributor

ESXI v7.0.3 - 440ar Raid - high disk latency

I have an ProLiant DL360 Gen9 with a 440ar Raid Controller with Raid 6 and 4 SSD Disks.
Suddenly i have very high disk latency even if there is no vm running.

I have already tryed to change the controller and the Smart Storage Battery but it still does not work.
I have also the latest driver updates.

VMware ESXi 7.0.3 build-20036589
VMware ESXi 7.0 Update 3

 

 


Smart Array P440ar in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: PDNLH0BRH753PJ
Cache Serial Number: PDNLH0BRH753PJ
RAID 6 Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 7.00
Firmware Supports Online Firmware Activation: False
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: Not Configured
Drive Write Cache: Disabled
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
Battery Backed Cache Size: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 49
Cache Module Temperature (C): 43
Number of Ports: 2 Internal only
Encryption: Not Set
Express Local Encryption: False
Driver Name: nhpsa
Driver Version: 70.0051.0.100-4vmw
PCI Address (Domain🚌Device.Function): 0000:03:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: RAID
Pending Controller Mode: RAID
Port Max Phy Rate Limiting Supported: False
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: CZJ6410R6R
Sanitize Erase Supported: True
Primary Boot Volume: None
Secondary Boot Volume: None

 

 

Smart Array P440ar in Slot 0 (Embedded)

Array A

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04130E
WWID: 31402EC001E1BE70
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 34
Maximum Temperature (C): 46
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04132T
WWID: 31402EC001E1BE71
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 36
Maximum Temperature (C): 47
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04152F
WWID: 31402EC001E1BE72
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 32
Maximum Temperature (C): 44
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04149L
WWID: 31402EC001E1BE73
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 29
Maximum Temperature (C): 45
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

Labels (3)
0 Kudos
4 Replies
NateNateNAte
Hot Shot
Hot Shot

At first glance, it looks like (maybe) a runaway process.  

 

Is this machine part of a host cluster or is it stand-alone?  Assuming no VM's are running, was this a new host added to a cluster and is DRS enabled?  I'm asking for context to see how the infrastructure is connected, or how this server plays into your set-up.  That would help track-down the best place to address the problem.

0 Kudos
PLP-Lackner
Contributor
Contributor

I deaktivated now the SSD Smart Path in the HP Settings.
Now the latency is much better.
I dont think that any hardware was or is damaged because i tried to change same parts.

But i dont know why it suddenly happens.

 

 

0 Kudos
NateNateNAte
Hot Shot
Hot Shot

Interesting that the smart-path setting helped to reduce that latency.  If it was that configuration even after you changed out HW/parts, then I would be looking at the firmware.

As to why it happened - that would definitely be a casefile to send to HP. It reminds me of some problems I used to have with HW that had an unknown software bug that allowed for a buffer overwrite to fill up an SQL server overnight.  We had to get the HW vendor involved to do a firmware update. It was not a common bug or occurrence, it just happened when we had a certain configuration applied.  We just did a pentest without realizing we were doing a pentest.  

I'm glad you got some resolution though.  It's up to you if you want to open a case with HP. Or VMware for that matter (just to cover all bases)

Tags (1)
0 Kudos
GaganpreetSingh
Contributor
Contributor

Hello Pal, 

Greetings for the Day.

Hope you are doing well. 

I see you issue and hard to believe host without vm's running can have high Disk latency 

Lets dig more deep and see the performance storage relate sense code on your environment 

Kindly run the command share the snaps 

 

egrep "H:0x" /var/run/logs/vmkernel.log | grep -v "H:0x0 D:0x2 P:0x0 Sense Data: 0x5 0x20 0x0" | grep -v "H:0x0 D:0x2 P:0x0 Sense Data: 0x5 0x24 0x0" | grep -v "Error" | awk '{print $1, $5, $13, $15, $16, $17, $21, $22, $23}' | sed 's/,/ /g' | sed 's/ / || /g'| grep -v vmhba | sort -u -k6

Else go to cd /var

cd run

cd logs or log 

Then upper and then other command 

 

grep -ve " 0x85" -e " 0x4d" -e " 0x1a" -e " 0x12" vmkernel.log | grep "sense data" | less

 

Share the output lets try to help you... 

Happy to Help .. 

 

Regards

Gaganpreet Singh 

 

0 Kudos