VMware Cloud Community
morpheleon
Contributor
Contributor

ESX 3.5 - EXT-3 Journal aborting

I have an IBM x3655 server with a ServeRaid 8k card and 6x500GB drives in RAID 5 which totals out at like 2.2TB. I updated the ServeRaid card yesterday to the 5.20.15412 firmware which was released by IBM on 1/15/2008.

The ESX install seemed to go fine, it created the VMFS fine, and I can create plenty of VM Guest OS's and haven't run into a problem there yet.

The problem I'm hitting is that over night (and not every night) at some point one of the EXT-3 file systems is running into problems and causing "Journal has aborted" errors and sector I/O errors. This has the effect of killing the entire system. The console is unresponsive (I can only go between alt-F1 and alt-F11 screens). and I have no log entries in /var/log/messages.

Has anyone else seen this type of behavior before? I've googled a lot and i've found lots of similar issues with other linux OS's but nothing specific to ESX and nothing seems to fit the same circumstances.

I appreciate any insight that anyone can provide! I've got 6 more of these servers to get working, and right now this is a show stopper.

0 Kudos
5 Replies
mcowger
Immortal
Immortal

This usually means (at least in my experience) that the underlying storage has gone offline.

--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
morpheleon
Contributor
Contributor

Matt,

Thanks for the tip. Any idea how/why a local RAID5 array would go offline like this? Is there some sort of timeout built into ESX or the aacraid_esx30 driver?

0 Kudos
mcowger
Immortal
Immortal

Not to my knowledge, but honestly, we dont use local storage for ESX at all, so it wouldn't be something I would have investigated.

Do the vmkernel logs tell you anything about SCSI timeouts?

--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
jccoca
Hot Shot
Hot Shot

Same issue with x460 and ServeRAID 8i, VMWare support says that it is a hardware problem, but if I boot into troubleshooting mode it didn't happen. Now I'm going back to 3.0.2 to do some test.

0 Kudos
morpheleon
Contributor
Contributor

I resolved my issue by applying 2 different firmware updates to the IBM server. 1 was related to the ServeRaid card itself. The other was for the SATA backplane chipset. It seems that the system is incapable of handling SATA drives at 3.0GB/s and the backplane has to be downgraded to the 1.5GB/s speeds with the firmware update.

Perhaps the same issue exists on your system.

0 Kudos