VMware Cloud Community
Markb111889
Contributor
Contributor
Jump to solution

PSOD Caused (?) by disk in predicted failure state

ESXi 4.1; Dell R710; RAID-1 (2 15kRPM SAS).  Random and unrepeated PSOD which Dell says was possibly caused by one of the RAID devices going into a predicted failure state.  Two questions:

1.  Anyone else experience anything similar?

2.  If Dell's hypothesis is correct, is there any way to configure ESXi to be "more tolerant" of this hardware state?  (Or would RAID-5 with a hot spare help?)

Appreciate your input.

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
EdWilts
Expert
Expert
Jump to solution

Markb111889 wrote:

ESXi 4.1; Dell R710; RAID-1 (2 15kRPM SAS).  Random and unrepeated PSOD which Dell says was possibly caused by one of the RAID devices going into a predicted failure state.  Two questions:

1.  Anyone else experience anything similar?

2.  If Dell's hypothesis is correct, is there any way to configure ESXi to be "more tolerant" of this hardware state?  (Or would RAID-5 with a hot spare help?)

Just talked to my coworker and he says another division in our company has seen exactly this.

Dell's hypothesis is correct and heir hardware is busted.  There is no way that a predicted failure state should cause a system to crash since ESXi wouldn't even know about the state other than information reported to it from the RAID controller.

It sounds to me that the drive's failure is beyond "predicted" - it's actually failed and doing nasty things to the RAID controller.

RAID-5 with a hot spare would not help - you're already mirrored which will give you better redundancy than RAID 5.  What the RAID controller should have done is ejected the drive and gracefully recovered.

.../Ed (VCP4, VCP5)

View solution in original post

0 Kudos
2 Replies
EdWilts
Expert
Expert
Jump to solution

Markb111889 wrote:

ESXi 4.1; Dell R710; RAID-1 (2 15kRPM SAS).  Random and unrepeated PSOD which Dell says was possibly caused by one of the RAID devices going into a predicted failure state.  Two questions:

1.  Anyone else experience anything similar?

2.  If Dell's hypothesis is correct, is there any way to configure ESXi to be "more tolerant" of this hardware state?  (Or would RAID-5 with a hot spare help?)

Just talked to my coworker and he says another division in our company has seen exactly this.

Dell's hypothesis is correct and heir hardware is busted.  There is no way that a predicted failure state should cause a system to crash since ESXi wouldn't even know about the state other than information reported to it from the RAID controller.

It sounds to me that the drive's failure is beyond "predicted" - it's actually failed and doing nasty things to the RAID controller.

RAID-5 with a hot spare would not help - you're already mirrored which will give you better redundancy than RAID 5.  What the RAID controller should have done is ejected the drive and gracefully recovered.

.../Ed (VCP4, VCP5)
0 Kudos
Markb111889
Contributor
Contributor
Jump to solution

Thanks, Ed.

0 Kudos