VMware
1 2 3 ... 7 Previous Next 99 Replies Last post: Mar 8, 2008 10:19 PM by Damin  

ESX 3.0.1 - Linux Guests go ReadOnly posted: Oct 12, 2006 7:45 PM

Click to view Damin's profile Enthusiast 100 posts since
Jan 17, 2006
Hello,
My hopes for the 3.0.1 release solving this problem have been dashed. Per the suggestion of Vmware support, we ripped out our Qlogic HBAs and went w/ software iSCSI and upgraded our 3.0.0 machine to 3.0.1.

Same problem. Under heavy load, the Linux guests go into Read Only mode on their filesystem.

Consoles indicate the following:
SCSI Error : <0 0 0 0> return code = 0x20008
end_request: I/O error, dev sda, sector 4928181
Aborting journal on device dm-0
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only.

All guests are using the LSI-Logic driver and have the latest Vmware tools installed.

This is very clearly a Vmware issue, and we've now proven that it IS NOT related to the Qlogic QLA4010 HBAS, as we migrated to Software iSCSI and are getting the same problem.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

1. Oct 13, 2006 8:10 AM in response to: Damin
Click to view jonhutchings's profile Hot Shot 252 posts since
Mar 29, 2006
It's similar to issues with redhat scsi drivers in a multipath environment, where SAN switch reset confuse the mutlipath driver and cause the filesystem to flag a corrupt journal. Is it possible that under heavy load your iSCSI network is dropping packets or performing badly in some way. This might expose some issues with the underlying iSCSI stack which are in turn getting exposed to the filesystem ? Might be worth trying to get an idea of the health of the network which you are sending your iSCSI i/o to see if something is underperforming/failling under load.

I totally agree that this problem should not be happening tho...

Message was edited by:
jonhutchings

Re: ESX 3.0.1 - Linux Guests go ReadOnly

2. Oct 13, 2006 9:29 AM in response to: Damin
Click to view paithal's profile Hot Shot 97 posts since
Feb 17, 2006
Can you see if there are any errors reported in ESX's /var/log/vmkernel?.

Also, make sure network connectivity is healthy and there are no intermittent failures. Path to storage being down for extended amount of time (for RH it is typically 1min, I guess) can take the FS read-only. Which target are you using?.

Is this a shared network or dedicated to iSCSI only?. Are any VLAN configuration in the picture?. Any traffic shaping or bandwidth allocation restrictions placed on the physical switch that is connected to iSCSI?.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

3. Oct 15, 2006 11:44 AM in response to: Damin
Click to view CTeague's profile Novice 8 posts since
Apr 14, 2006
Wow, having this same problem I didn't even have to search to find another with it!

I'm running CentOS 4.3 (4x guests) on ESX 3.0 and 3.0.1 w/ VMWare Tools installed.

I experience the same exact read-only problem....I have had this happen on three occasions now and a reboot temporarily fixed the issue. The problem is that I don't have any heavy loads going to any of these guests (Apache web servers / mySQL servers, currently with little web traffic / db traffic as it's still heavy in development).

I'm using PE2850 (Dual Duo-Core 2.8ghz / 8gb Ram / 2x QLogic PCI-E HBA) on a Sun based SAN. The SAN is controlled by a higher technical department within my school so the details I can give right now are lacking.

Any ideas what could be causing this?

Re: ESX 3.0.1 - Linux Guests go ReadOnly

4. Oct 15, 2006 1:25 PM in response to: Damin
Click to view manuel.wenger's profile Enthusiast 98 posts since
Feb 17, 2005
There's another thread discussing the same problem, in case you missed it:

http://www.vmware.com/community/thread.jspa?threadID=58081

Re: ESX 3.0.1 - Linux Guests go ReadOnly

7. Oct 17, 2006 2:05 PM in response to: Damin
Click to view paithal's profile Hot Shot 97 posts since
Feb 17, 2006
Who did you provide the vmkernel log?. Did you file an SR?. Do you have SR number?.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

9. Oct 21, 2006 10:42 AM in response to: Damin
Click to view paithal's profile Hot Shot 97 posts since
Feb 17, 2006
Can you post the SR number?. We will take a look at the vm-support logs.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

11. Oct 22, 2006 12:21 AM in response to: Damin
Click to view manuel.wenger's profile Enthusiast 98 posts since
Feb 17, 2005
I'm wondering if this fact is a "known issue" as in "we will/must do something to fix it", or if it's an expected behaviour.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

13. Oct 22, 2006 8:50 PM in response to: Damin
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
I've been battling this issue for weeks so it's good to know I'm not alone.

In researching the issue tonigh I found the following bug which looks like it might be related:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=197158

It looks like this problem may be caused by a fairly recent (in the last few months) change in the upstream LSI Logic driver. For RHEL4 systems this code was changed between U2 and U3. I wonder if we could simply recompile the mptscsih driver from U2 for use in U3 and newer as a temporary solution. Has anybody tried that?

Basically, the newer driver adds an extra DID_BUS_BUSY status to a SCSI command failure of , MPI_SCSI_STATUS_BUSY which causes the SCSI mid-layer only tries 5 times before reporting a timeout to the upper layers. With previous versions without this extra return status the SCSI mid-layer would try indefinitely.

If this is really the issue it would be pretty simple to patch and compile a new mptscsih driver to temporarily work around this issue. Maybe I'll give that a try tomorrow.

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

14. Oct 23, 2006 3:31 PM in response to: Damin
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
OK, today I decided to try patching the mptscsi driver to revert it's behavior to RHEL4 U2 and prior. So far, this seems to be working. I've run significant tests on several systems today, including a set of my most troublesome boxes, connected to a lowly AX150i, that previously would fail in 5-10 minutes. So far the boxes have survived the day without issue.

There have been several iSCSI timeouts issues on the hosts (as noted in the vmkernel logs) which would normally have been propagated as SCSI timeouts in the guest but with the patched driver the Linux system seems to just pause, and the return to normal operation.

I've posted the patched files and some crude instructions at http://www.tuxyturvy.com/blog/index.php?/archives/31-VMware-ESX-and-ext3-journal-aborts.html so others can give it a try if they want too.

I'm not willing to say this is 100% yet, perhaps the systems are just behaving today, but so far it looks good.

Later,
Tom
1 2 3 ... 7 Previous Next Go to original post

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities