RHEL 3 Guest OS crashing

sthistle · ‎01-29-2007

We have an ESX 3.0.1 host that has two HBAs. Whenever we perform maintenance on one of the HBA links (IBM DS4800 SAN), it takes out the RHEL guest. The guest apparently loses connectivity on its disks. This happens as well whenever we vmotion the guest. The errors in the messages file show the following. One of the guests has and older vmware tools installed, and the other does not have any tools installed. I will upgrade both anyway:

Jan 25 20:34:07 hostname kernel: SCSI error : <0 0 1 0> return code = 0x20008
Jan 25 20:34:07 hostname kernel: end_request: I/O error, dev sdb, sector 45937537
Jan 25 20:34:07 hostname kernel: Buffer I/O error on device dm-2, logical block 5742144
Jan 25 20:34:07 hostname kernel: lost page write due to I/O error on dm-2
Jan 25 20:34:09 hostname kernel: SCSI error : <0 0 0 0> return code = 0x20008
Jan 25 20:34:09 hostname kernel: end_request: I/O error, dev sda, sector 1943005
Jan 25 20:34:09 hostname kernel: Buffer I/O error on device dm-0, logical block 216722
Jan 25 20:34:09 hostname kernel: lost page write due to I/O error on dm-0
Jan 25 20:34:09 hostname kernel: SCSI error : <0 0 1 0> return code = 0x20008
Jan 25 20:34:09 hostname kernel: end_request: I/O error, dev sdb, sector 68682121
Jan 25 20:34:09 hostname kernel: EXT3-fs error (device dm-2): read_inode_bitmap: Cannot read inode bitmap - block_group = 262, inode_bitmap = 8585217
Jan 25 20:34:09 hostname kernel: Aborting journal on device dm-2.
Jan 25 20:34:11 hostname kernel: SCSI error : <0 0 1 0> return code = 0x20008
Jan 25 20:34:18 hostname kernel: end_request: I/O error, dev sdb, sector 12769
Jan 25 20:34:18 hostname kernel: Buffer I/O error on device dm-2, logical block 1548
Jan 25 20:34:18 hostname kernel: lost page write due to I/O error on dm-2
Jan 25 20:34:18 hostname kernel: ext3_abort called.
Jan 25 20:34:18 hostname kernel: EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
Jan 25 20:34:18 hostname kernel: Remounting filesystem read-only
Jan 25 20:34:18 hostname kernel: SCSI error : <0 0 1 0> return code = 0x20008

Jan 25 20:34:18 hostname kernel: end_request: I/O error, dev sdb, sector 385

Jan 25 20:34:18 hostname kernel: Buffer I/O error on device dm-2, logical block 0

sthistle · ‎01-29-2007

Should be RHEL 4 guest OS.. Not version 3..

Thanks

kevde · ‎01-31-2007

Hello

We have exactly the same problem with our Hitachi SAN.

bertdb · ‎01-31-2007

Read this article :

http://www.tuxyturvy.com/blog/index.php?/archives/31-VMware-ESX-and-ext3-journal-aborts.html

I bet that's your situation as well.

kevde · ‎01-31-2007

Yes, you're right and this topic is also providing some patched RedHat Kernels: http://www.vmware.com/community/thread.jspa?threadID=58121

kevde · ‎02-01-2007

This article in KB seems to give a solution:

http://kb.vmware.com/vmtnkb/search.do?cmd=displayKC&docType=kc&externalId=51306&sliceId=SAL_Public

I have applied the new driver. We will see in the next days if it works...

Wilf · ‎05-02-2007

I'm seeing the same aborted journal on rhel5 as I did on rhel4 (fixed by the replacement vmware driver RPM) but the VMware KB doesn't yet cover this. Anyone have a quick fix that doesn't require building a new kernel?

twwlogin · ‎05-17-2007

I just posted a comment to RH bug #197158 asking for a forward-port of the fix to RHEL5. We're having the same problem. We downloaded the source to the latest RHEL5 kernel and patched it. Running with the patched kernel now. Will know in two days if things were successful.

djflux · ‎08-09-2007

I'm having the same problem on a RHEL5 guest. I've downloaded the kernel src rpm from here and mock built it for i686:

http://people.redhat.com/coldwell/kernel/bugs/225177/

As soon as the build is complete and I install and test, I will report back.

Thanks,

Flux.

Message was edited by:

djflux

Message was edited by:

djflux

All

RHEL 3 Guest OS crashing