VMware Cloud Community
SMADale
Contributor
Contributor

Chronic RAID Failure on HP ML350 using B140i RAID

I have four identical HP ML350 Gen 9 servers.
Each with:
16gb memory
RAID 10 consisting of 2tb drives with two hot spares.

When using ESXi 6.5 I have had RAID failures on all three servers.
All three servers are running the latest firmware and all updates I can locate from HP.
The ESXi install is the HP distribution.


Typical example:
Yesterday I was creating two 2012 server VMs (stored on the RAID 10) and installing the OS via an ISO also stored on the RAID 10.
Each VM was thick provisioned with primary drive 160gb or 200gb.
Provisioning completed without issue.

I then started both systems to install the operating systems; it reached 40-48% then stopped.

No obvious errors, etc. Completely unresponsive to shutdown request, etc.

When I looked at storage, it was gone.

I checked iLO to see the health of my storage, it reported a failed RAID.

Storage Info

  • -Controller on System Board
Logical View

Physical View

Controller Status

OK

Serial Number

N/A

Model

Dynamic Smart Array B140i Controller

Firmware Version

  1. 4.50

Controller Type

HPE Smart Array

      • -Drive Enclosure Port 1I Box 3
Status

OK

Drive Bays

4

      • -Drive Enclosure Port 2I Box 3
Status

OK

Drive Bays

4

      • -Logical Drive 01
Status

Failed

Capacity

3725 GiB

Fault Tolerance

RAID 1/RAID 1+0

Logical Drive Type

Data LUN

Encryption Status

Not Encrypted

      • -Physical Drive in Port 1I Box 3 Bay 1
Status

OK

Serial Number

Y67HK2LXF1BA

Model

MB2000GDUNV

Media Type

HDD

Capacity

2000 GB

Location

Port 1I Box 3 Bay 1

Firmware Version

HPG4

Drive Configuration

Configured

Encryption Status

Not Encrypted

      • -Physical Drive in Port 1I Box 3 Bay 2
Status

Failed

Serial Number

Y652K472F1BA

Model

MB2000GDUNV

Media Type

HDD

Capacity

2000 GB

Location

Port 1I Box 3 Bay 2

Firmware Version

HPG4

Drive Configuration

Configured

Encryption Status

Not Encrypted

      • -Physical Drive in Port 1I Box 3 Bay 3
Status

OK

Serial Number

N4G6ZSTY

Model

MB2000GFEMH

Media Type

HDD

Capacity

2000 GB

Location

Port 1I Box 3 Bay 3

Firmware Version

HPG2

Drive Configuration

Configured

Encryption Status

Not Encrypted

      • -Physical Drive in Port 1I Box 3 Bay 4

Status

Failed

Serial Number

N4G70M0Y

Model

MB2000GFEMH

Media Type

HDD

Capacity

2000 GB

Location

Port 1I Box 3 Bay 4

Firmware Version

HPG2

Drive Configuration

Configured

Encryption Status

Not Encrypted

Interestingly, when I reboot the hardware it shows those failed drives as being just fine. Also, this is the same failure I see on my other servers.

I have also tested the drives individually and they are good.

Thoughts?

I have created a vmware support bundle from the server if anybody wants to view it.

5 Replies
Mc20piece
Contributor
Contributor

I am having this exact same issue on vsphere 6.5 with a HPE DL 380 Gen9 with 12 2TB drives in a RAID6 on a P440ar controller. Every couple of weeks the storage disappears. In my case iLO also shows the storage as being healthy and reboot brings back the volume/datastore in vsphere.

0 Kudos
SMADale
Contributor
Contributor

Unfortunately not the case for me. I have to recreate the array which destroys everything.

0 Kudos
Mc20piece
Contributor
Contributor

It was odd this server had been rock stable before moving to 6.5

0 Kudos
SMADale
Contributor
Contributor

The saga continues.

Yesterday HP sent out two new drives.
Today I built out the raid, created a few vm's. About two hours later the RAID failed. This time it was not the new drives, it was the existing drives.  Arg!

0 Kudos
EnGiJack
Contributor
Contributor

Hi,

same identical problem with an HPE ML110 and B140i.

4 WD RED Pro 4TB drives configured in 1 LUN RAID 10.

ESXi 6.5 installed from HP ISO on a microSD. Everything is working fine until I stress the storage section with a backup, LUN failed with two mirrored disks failed (of the totale 4, as your case). I reboot the host, I go in the Smart Array page, all the drives are OK, I re-enable the LUN and it is working fine...until the next crash. It seems a software bug, but a huge one!!!

I tried to update the ESXi with all the latest patch using the Update Manager but with no luck, same problem.

Did you find a solution?

Thanks a lot.

Best Regards.

0 Kudos