VMware Cloud Community
PLP-Lackner
Contributor
Contributor

ESXI v7.0.3 - 440ar Raid - high disk latency

I have an ProLiant DL360 Gen9 with a 440ar Raid Controller with Raid 6 and 4 SSD Disks.
Suddenly i have very high disk latency even if there is no vm running.

I have already tryed to change the controller and the Smart Storage Battery but it still does not work.
I have also the latest driver updates.

VMware ESXi 7.0.3 build-20036589
VMware ESXi 7.0 Update 3

 

 


Smart Array P440ar in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: PDNLH0BRH753PJ
Cache Serial Number: PDNLH0BRH753PJ
RAID 6 Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 7.00
Firmware Supports Online Firmware Activation: False
Rebuild Priority: High
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: Yes
Current Parallel Surface Scan Count: 1
Max Parallel Surface Scan Count: 16
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: Not Configured
Drive Write Cache: Disabled
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
Battery Backed Cache Size: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: True
SSD Caching Version: 2
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 49
Cache Module Temperature (C): 43
Number of Ports: 2 Internal only
Encryption: Not Set
Express Local Encryption: False
Driver Name: nhpsa
Driver Version: 70.0051.0.100-4vmw
PCI Address (Domain:bus:Device.Function): 0000:03:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: RAID
Pending Controller Mode: RAID
Port Max Phy Rate Limiting Supported: False
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Survival Mode: Enabled
Host Serial Number: CZJ6410R6R
Sanitize Erase Supported: True
Primary Boot Volume: None
Secondary Boot Volume: None

 

 

Smart Array P440ar in Slot 0 (Embedded)

Array A

physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04130E
WWID: 31402EC001E1BE70
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 34
Maximum Temperature (C): 46
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04132T
WWID: 31402EC001E1BE71
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 36
Maximum Temperature (C): 47
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04152F
WWID: 31402EC001E1BE72
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 32
Maximum Temperature (C): 44
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 4 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: SVQ02B6Q
Serial Number: S5STNF0TB04149L
WWID: 31402EC001E1BE73
Model: ATA Samsung SSD 870
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 29
Maximum Temperature (C): 45
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
PHY Physical Link Rate: Unknown
PHY Maximum Link Rate: Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

Labels (3)
Reply
0 Kudos
3 Replies
NateNateNAte
Enthusiast
Enthusiast

At first glance, it looks like (maybe) a runaway process.  

 

Is this machine part of a host cluster or is it stand-alone?  Assuming no VM's are running, was this a new host added to a cluster and is DRS enabled?  I'm asking for context to see how the infrastructure is connected, or how this server plays into your set-up.  That would help track-down the best place to address the problem.

Reply
0 Kudos
PLP-Lackner
Contributor
Contributor

I deaktivated now the SSD Smart Path in the HP Settings.
Now the latency is much better.
I dont think that any hardware was or is damaged because i tried to change same parts.

But i dont know why it suddenly happens.

 

 

Reply
0 Kudos
NateNateNAte
Enthusiast
Enthusiast

Interesting that the smart-path setting helped to reduce that latency.  If it was that configuration even after you changed out HW/parts, then I would be looking at the firmware.

As to why it happened - that would definitely be a casefile to send to HP. It reminds me of some problems I used to have with HW that had an unknown software bug that allowed for a buffer overwrite to fill up an SQL server overnight.  We had to get the HW vendor involved to do a firmware update. It was not a common bug or occurrence, it just happened when we had a certain configuration applied.  We just did a pentest without realizing we were doing a pentest.  

I'm glad you got some resolution though.  It's up to you if you want to open a case with HP. Or VMware for that matter (just to cover all bases)

Tags (1)
Reply
0 Kudos