VMware Cloud Community
MrVmware9423
Expert
Expert
Jump to solution

PSOD issue ESX4.1 (following Array have Missing or Rebuilding or Failed Members and r degraded/critical)

Dear Team,

We have got PSOD issue on one of the esx host,

pastedImage_1.png

after hard reboot are getting this error and boots then after every 30Mins reboots or PSOD

pastedImage_2.png

Need ur urgent assistance on the same.

regards

Mr VMware

Reply
0 Kudos
1 Solution

Accepted Solutions
MrVmware9423
Expert
Expert
Jump to solution

***ANSWER

After replacing Slot 5 HDD and recreate the RAID, Report issue resolved, following are the complete details .......

Dear Team,

We have locked a case at hardware vendor and following are the details.

s per logs, HDD in Slot-5 not detecting and array is Critical.

State...........................Critical

Bad stripes.....................Yes

Found medium error on slot-5. Hence we are Ordering hdd for replacement (Replace the HDD after confirming barcode).

We would suggest to Re-create(after data backup) the Logical drive-2 as there is BAD STRIPES found.

After recreating the Array, Update all the codes to latest (after complete data backup) 

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

IBM BIOS Flash Update ( readme )

  1. 1.04 (GGE127A)
  2. 1.19 (GGE149A)

Non-Critical

Required

IBM Preboot Diagnostics Flash Update

(GGYT21A)

  1. 1.12 (GGYT39A)

Non-Critical

Required

Baseboard Management Controller Flash Update

  1. 1.24 (GGBT38A)
  2. 1.49 (GGBT61A)

Suggested

Required


------------------------------------------------------------------------------
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

1)      Please let me know if there is some hardware problem thenm why we are not able to see any amber indication on server.
There was Medium errors on HDD which does not indicate amber led on drive.


2)      What was the cause of RAID configuration issue. Why was the need to re-create RAID configuration???

As there was Bad stripes found on Logical Dirve-2(in other words, virtual bad sector) and  Re-creating Logical Drive is the only solution to avoid more number bad stripes leading to data crash.


3)      Please share the Root cause for above issues.

If the HDD contain medium errors, data loss may be detected, or in very rare circumstances incorrect data may be read. The array may be in a critical state, rebuilding, or optimal after a rebuild completes.
Typically the problem occurs when one of the drives in the mirror is marked defunct and the surviving drive retains an uncorrected medium error. When this happens, there is no way to recover the missing data in that location. There is also a very remote possibility that incorrect data may be read from that location.
Hence we have replaced the HDD in proactive basis which was having medium error to avoid Data loss.


4)      Share before & After firmware details.

Below is the comparison for Firmware:-


Before:-
 

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

[X]

IBM BIOS Flash Update ( readme )

  1. 1.04 (GGE127A)
  2. 1.19 (GGE149A)

Non-Critical

Required

[X]

IBM Preboot Diagnostics Flash Update

(GGYT21A)

  1. 1.12 (GGYT39A)

Non-Critical

Required

[X]

Baseboard Management Controller Flash Update

  1. 1.24 (GGBT38A)
  2. 1.49 (GGBT61A)

Suggested

Required



After:-

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

[ ]

IBM BIOS Flash Update ( readme )

  1. 1.19 (GGE149A)
  2. 1.19 (GGE149A)

Not Required

Required

[ ]

IBM Preboot Diagnostics Flash Update

(GGYT39A)

  1. 1.12 (GGYT39A)

Not Required

Required

[ ]

Baseboard Management Controller Flash Update

  1. 1.49 (GGBT61A)
  2. 1.49 (GGBT61A)

Not Required

Required


 


View solution in original post

Reply
0 Kudos
2 Replies
abhilashhb
VMware Employee
VMware Employee
Jump to solution

Have you tried to enable/restore RAID on Array configuration utility during boot? You can go to that option by pressing Ctrl+A on the boot.

Abhilash B
LinkedIn : https://www.linkedin.com/in/abhilashhb/

MrVmware9423
Expert
Expert
Jump to solution

***ANSWER

After replacing Slot 5 HDD and recreate the RAID, Report issue resolved, following are the complete details .......

Dear Team,

We have locked a case at hardware vendor and following are the details.

s per logs, HDD in Slot-5 not detecting and array is Critical.

State...........................Critical

Bad stripes.....................Yes

Found medium error on slot-5. Hence we are Ordering hdd for replacement (Replace the HDD after confirming barcode).

We would suggest to Re-create(after data backup) the Logical drive-2 as there is BAD STRIPES found.

After recreating the Array, Update all the codes to latest (after complete data backup) 

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

IBM BIOS Flash Update ( readme )

  1. 1.04 (GGE127A)
  2. 1.19 (GGE149A)

Non-Critical

Required

IBM Preboot Diagnostics Flash Update

(GGYT21A)

  1. 1.12 (GGYT39A)

Non-Critical

Required

Baseboard Management Controller Flash Update

  1. 1.24 (GGBT38A)
  2. 1.49 (GGBT61A)

Suggested

Required


------------------------------------------------------------------------------
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

1)      Please let me know if there is some hardware problem thenm why we are not able to see any amber indication on server.
There was Medium errors on HDD which does not indicate amber led on drive.


2)      What was the cause of RAID configuration issue. Why was the need to re-create RAID configuration???

As there was Bad stripes found on Logical Dirve-2(in other words, virtual bad sector) and  Re-creating Logical Drive is the only solution to avoid more number bad stripes leading to data crash.


3)      Please share the Root cause for above issues.

If the HDD contain medium errors, data loss may be detected, or in very rare circumstances incorrect data may be read. The array may be in a critical state, rebuilding, or optimal after a rebuild completes.
Typically the problem occurs when one of the drives in the mirror is marked defunct and the surviving drive retains an uncorrected medium error. When this happens, there is no way to recover the missing data in that location. There is also a very remote possibility that incorrect data may be read from that location.
Hence we have replaced the HDD in proactive basis which was having medium error to avoid Data loss.


4)      Share before & After firmware details.

Below is the comparison for Firmware:-


Before:-
 

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

[X]

IBM BIOS Flash Update ( readme )

  1. 1.04 (GGE127A)
  2. 1.19 (GGE149A)

Non-Critical

Required

[X]

IBM Preboot Diagnostics Flash Update

(GGYT21A)

  1. 1.12 (GGYT39A)

Non-Critical

Required

[X]

Baseboard Management Controller Flash Update

  1. 1.24 (GGBT38A)
  2. 1.49 (GGBT61A)

Suggested

Required



After:-

E-mail

Name

Installed Version

New Version

BIOS/Firmware/Driver

Severity

Reboot

[ ]

IBM BIOS Flash Update ( readme )

  1. 1.19 (GGE149A)
  2. 1.19 (GGE149A)

Not Required

Required

[ ]

IBM Preboot Diagnostics Flash Update

(GGYT39A)

  1. 1.12 (GGYT39A)

Not Required

Required

[ ]

Baseboard Management Controller Flash Update

  1. 1.49 (GGBT61A)
  2. 1.49 (GGBT61A)

Not Required

Required


 


Reply
0 Kudos