VMware
1 ... 3 4 5 6 7 Previous Next 99 Replies Last post: Mar 8, 2008 10:19 PM by Damin   Go to original post

Re: ESX 3.0.1 - Linux Guests go ReadOnly

90. Dec 5, 2007 5:32 AM in response to: socius
Click to view sriramrajan's profile Novice 7 posts since
Nov 1, 2006


We ran about 20-25 RHEL 3 and 4 VMs using Vmware ESX 3.1 and 3.2

Never had a kernel panic. The read only issue is now fixed with RHEL 5 also.

I have seem kernel panics when using the Redhat cluster services on Vmware (free server) but that may not be related to this issue. Those are mainly with GFS or CLVMD.

Sriram

Re: ESX 3.0.1 - Linux Guests go ReadOnly

91. Dec 5, 2007 5:40 AM in response to: sriramrajan
Click to view socius's profile Novice 13 posts since
May 3, 2007
The errors look like the attached screenshots in this post:

The first picture is before attempted reboot and the second is after.

Attachments:

Re: ESX 3.0.1 - Linux Guests go ReadOnly

92. Dec 6, 2007 3:13 PM in response to: socius
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
socius wrote:
The errors look like the attached screenshots in this post:

The first picture is before attempted reboot and the second is after.


I have never seen any of our filesystems corrupted significantly, but some did experience minor corruption and I would suspect it's possible that bad things could happen if the error hit at a critical time. Do you happen to install your systems with only a single partition (or perhaps just root and boot partitions)? That would certainly make it more susceptible to corrupting the root volume.

You can probably boot a CD in rescue mode and run fsck on the root filesystem to get the system booting again.

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

93. Dec 6, 2007 10:58 PM in response to: tsightler
Click to view socius's profile Novice 13 posts since
May 3, 2007

Yes, they are indeed installed with a single partition and then cloned from the same installation. What surprised me is that we got this problem with file system corruption on all of the ones that got mounted as read only.

Unfortunately, rescue mode did not help fixing the file systems. We managed to get some of our data back, but the vms still won't boot. We're naturally reluctant to install any more linux vms until we know that this won't happen again, that's why I' was curious if anybody else had experienced this.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

95. Dec 7, 2007 6:49 AM in response to: Damin
Click to view socius's profile Novice 13 posts since
May 3, 2007

I'm not really an expert on SAN infrastructure so maybe these are not the answers you were looking for, but I know we are using HP EVA 8000.

The ESX hosts in question are running on HP DL385 G1 hardware and I think the HBAs are Emulex LP1050DC.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

96. Dec 7, 2007 7:10 AM in response to: socius
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
socius wrote:

Yes, they are indeed installed with a single partition and then cloned from the same installation. What surprised me is that we got this problem with file system corruption on all of the ones that got mounted as read only.


That is pretty unusual. You mentioned RHEL4 U3 and there was a known bug that caused additional corruption on ext3 with some kernels from that release. I don't remember the specifics, but it would cause a similar "ext3 journal abort" even on normal systems on the SAN. These issues are fixed in more recent releases (U5 and above).

Also, even though I've noticed many distro's have stopped suggesting using multiple filesystems for root, /var, and /usr I still think this is a good practice. We continue to use separate filesystems for volumes that get significant rights as this significantly decreases the chance of the root or /usr filesystem becoming corrupted.

Unfortunately, rescue mode did not help fixing the file systems. We managed to get some of our data back, but the vms still won't boot. We're naturally reluctant to install any more linux vms until we know that this won't happen again, that's why I' was curious if anybody else had experienced this.

The error you are getting is critical, but shouldn't be unrecoverable except in the worst cases. There are multiple copies of the group descriptors on disk and you should be able to locate them with dumpe2fs. Worst case you should be able to get your data by using debugfs. I've been doing this a long time, and I've never seen this be unrecoverable, but it is a touchy recovery and it's certainly possible that you have an unrecoverable situation that I've never run across. Just because I haven't seen it doesn't mean it does exist.

I can certainly understand being reluctant to reinstall linux VMs. I'd be that way too if I were in your situation. I will tell you that I personally run over 20 RHEL4 (and now a few RHEL5) VM's running critical production systems and have never seen more than minor corruption, and none since the problems with journal aborts were corrected. Of course, backups are always important, and you'll want to test your setup thoroughly but I believe it is possible to have great success with Linux on VMware.

Later,
Tom

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities