VMware
1 ... 3 4 5 6 7 Previous Next 99 Replies Last post: Mar 8, 2008 10:19 PM by Damin   Go to original post

Re: ESX 3.0.1 - Linux Guests go ReadOnly

60. Feb 26, 2007 6:41 AM in response to: Ops admin
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
We have 20+ other systems that are running the stock
stuff "2.6.9-22.ELsmp #1 SMP" and "mptlinux-3.02.18".
We need to update the kernel to apply the patch.

Does anyone know if we need to apply the patch to the
stock kernel systems? I am just wondering if the
problem wont show itself with what we have on the
older systems.


For RHEL4 the problem only occurs with U3 and above. The 2.6.9-22.EL kernel is, I believe, the kernel from RHEL4 U2 so it doesn't actually have the problem. I'm not 100% sure, but I believe that the first "released" kernel that included this problem is 2.6.9-34.EL. Any non-development kernel older than that should be safe.

I actually think the update that broke this was added in the U3 development chain around kernel 2.6.9-22.25.EL, but it's been a long time so I may not have the exact correct release. For example, I have some 2.6.9-28.EL U3 beta kernels which definitely have the "broken" driver.

Later,
Tom

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

61. Feb 28, 2007 7:46 AM in response to: Ops admin
Click to view cuzic4n's profile Lurker 3 posts since
Jan 18, 2006
So, to summarize should we get the kernel updated to 2.6.9-42.0.8 AND apply the vmware patch via http://kb.vmware.com/vmtnkb/search.do?cmd=displayKC&docType=kc&externalId=51306&sliceId=SAL_Public
OR do we just apply the kernel patch and we should be good to go since the problem really was in rhel4?

I was set on just going to the latest kernel, but then "Ops Admin" replied 2 posts before this one and said that he was on the latest kernel and still had problems..

Re: ESX 3.0.1 - Linux Guests go ReadOnly

62. Mar 1, 2007 2:27 PM in response to: cuzic4n
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
The "latest kernel" does not fix the problem, at least as of 2.6.9-42.0.8, so, yes, if you are going to update to the latest kernel then you also need either VMware's fix or my fix.

On the other hand, if you are on kernel 2.6.9-22.0.2 or earlier, and you're not planning to upgrade your kernel, then you don't need to do anything as those kernels don't have the problem.

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

63. Mar 6, 2007 3:04 PM in response to: CMCC
Click to view sjc_dogs's profile Lurker 1 posts since
Jan 26, 2006
Hi,

I am having problem on RHEL4U3, RHEL3U8 and SUSE9 SP3 VMs. This is a NAS / SAN system and I am running Connectathon test which is an NFS test for NAS side testing. When running the special test using NFS version 3 , using UDP protocol, it gets stuck at a point where it says "write/read 30 MB file". On RHEL3U8 and SUSE9 SP3, it gets stuck for 5-7 min, then continues fine. On RHEL4U3, it gets stuck forever.

Is my problem and the problem you discussed can be related? Plesae let me know.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

64. Apr 1, 2007 9:29 AM in response to: sjc_dogs
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
Has anyone tested RHEL4 U5 Beta? Kernels newer than 2.6.9-45 are supposed to have a more "correct" fix for this issue built in but I haven't had time to test. I'm going to try to find time this week, but I'd love to hear if anyone else has given them a spin.

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

65. May 5, 2007 1:16 PM in response to: tsightler
Click to view soleblazer's profile Hot Shot 228 posts since
May 25, 2006
update 5 is out now, anyone know if this solves the issue?

Re: ESX 3.0.1 - Linux Guests go ReadOnly

66. May 7, 2007 12:03 PM in response to: soleblazer
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
U5 does include a patch that is supposed to correct this issue. I've only got one VM running the new kernel so far but I hammered on it this weekend and it passed without any special patches so it looks promising. I've got some systems on a much slower EMC AX150i that were always great and showing the problem. I'll try to upgrade those and see how they behave in the next few days.

Interestingly, the problem does seem to still be an issue for RHEL5 (admittedly not supported by VMware yet anyway, but we've got a few test systems running). I'm hoping to update my page with a workaround for RHEL5 soon.

Later,
Tom

Re: ESX 3.0.1 - Linux Guests go ReadOnly

67. May 7, 2007 12:43 PM in response to: tsightler
Click to view jatwell's profile Enthusiast 23 posts since
May 23, 2006
I would be interested in knowing if the RHEL4 U5 kernel has the fix too.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

68. May 8, 2007 8:53 PM in response to: jatwell
Click to view tsightler's profile Hot Shot 177 posts since
Sep 30, 2005
Well, as stated above, RHEL4 U5 does include a patch which is supposed to correct the issue. It's not the same patch that is on my website (which is really just a hack) but is supposed to be a more "correct" fix for the problem. So far testing is looking very good.

I've now upgraded two VM's to RHEL4 U5, one connected to an EMC AX150i (iSCSI) that was always easy to reproduce the issue with RHEL4 U3 and U4 in just a couple of hours (sometimes minutes), and another connected to an EMC CX700 (Fiber Channel) which I could usually make fail in 8-10 hours.

I've now hammered on both of these for almost 48 hours with no signs of problems so it seems like the "fix" in RHEL4 U5 looks pretty good, but I'm conservative, I'll need a few weeks of production runtime without errors, or reports from others, before I'll consider it completely safe.

Later,
Tom

Message was edited by:
tsightler

Re: ESX 3.0.1 - Linux Guests go ReadOnly

69. May 22, 2007 5:49 PM in response to: tsightler
Click to view ibm-sorcerer's profile Novice 5 posts since
Apr 6, 2006
I also have the issue showing up on 2 of my RHEL5 systems. So we are all in the same boat...

Re: ESX 3.0.1 - Linux Guests go ReadOnly

70. Jun 1, 2007 1:02 PM in response to: ibm-sorcerer
Click to view jatwell's profile Enthusiast 23 posts since
May 23, 2006
So now, do we consider this fixed? Just wondering if it would be wise to recompile the driver for the new kernel or if the update already included in the new kernel definitely fixes this issue.

After updating to the newer kernel I have not seen this issue, but I only have 2 linux VM's. Soon, I plan on helping someone roll out 20-30 so I'd like to get this right and not run into the issue down the road on heavy hit production machines.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

72. Jun 4, 2007 5:15 PM in response to: Damin
Click to view sriramrajan's profile Novice 7 posts since
Nov 1, 2006
Anyone try this with RHEL 5 ?

My kernel version is 2.6.18-8.1.1.el5.

I am trying compile the module from source and getting the following errors.

In file included from /home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.c:50:
include/linux/config.h:6:2: warning: #warning Including config.h is deprecated.
/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.c: In function âmpt_suspendâ:
/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.c:1646: error: switch quantity not an integer
/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.c: In function âMakeIocReadyâ:
/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.c:2510: error: implicit declaration of function âcrashdump_modeâ
make[2]: *** [/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01/mptbase.o] Error 1
make[1]: *** [_module_/home/sri/mptscsi_vmware/mptscsi-rhel-3.02.62.01] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-8.1.4.el5-i686'

Sriram

Re: ESX 3.0.1 - Linux Guests go ReadOnly

73. Jun 6, 2007 12:42 AM in response to: Damin
Click to view stan-whitfield's profile Novice 16 posts since
Jul 22, 2006
I am so glad I found this thread! We've had this problem with SLES 10 VMs using the LSI Logic adapter on 3.0.0 (we'll be uprading to 3.0.1 soon). After researching this issue from the Linux kernel side, I found that the mptbase LSI Logic SCSI driver code in the 2.6.x kernel is buggy on its own, unrelated to VMware.

http://lkml.org/lkml/2005/10/11/224

I have 2 SLES 10 and 1 SLES 9 VMs all on 2.6.x kernels. The two SLES 10s were built with the LSI Logic adapter and driver and the SLES 9 VM was built with the Buslogic adapter. Both of the SLES 10 VMs experience the massive scsi errors and the read only remounts. The SLES 9 VM using Buslogic goes not.

I'm going to convert to Buslogic on the SLES 10 VMs and see if that helps. I think I saw a procedure for this documented in this thread. If not I'll figure it out, implement it, and I'll post back with results.

Oh, btw, we're all FC SAN at my shop. IBM and Nexsan arrays, Qlogic switches. No multipathing. We have a mix of VMFS volumes with multiple VMs stored on each VMFS. We map RDMs to individual VMs that need lots of storage. As a general rule, for any VM that will need over 10GB (i.e. database server), we create a 10GB virtual disk on a shared VMFS volume and install the base OS on it. We then map an RDM to it, format it with Reiser3, and mount it into the filesystem.

I do see some SCSI errors in our W2K3 VM event logs but we're not having huge issues as a result of them, not like with these Linux VMs. However, if the Buslogic adapter/driver works better in Windows, I'll look into swapping those over as well.

Re: ESX 3.0.1 - Linux Guests go ReadOnly

74. Jun 6, 2007 3:34 AM in response to: stan-whitfield
Click to view stan-whitfield's profile Novice 16 posts since
Jul 22, 2006
Turns out swapping to the Buslogic isn't as straightforward as I thought it may be. I can't seem to select a new scsi module from yast if the hardware isn't "present' when I do a scan...

So, I downloaded the LSI logic SLES 10 driver patch linked in this KB:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=51306

However, the darn MD5 sum doesn't match. All the other files linked on that KB page have MD5 sums that check out, with the exceptoin of the SLES 10 download. I'm wary of using it because of this. And darn it I've been up all night working on this and want to have it done before I go into the office today!! I've got half a notion to just throw caution to the wind regarding the MD5 sum mismatch assuming it's a VMware documentation glitch. Something like ... they uploaded a new file and forgot to change the MD5 sum in the text of the KB article.

I've downloaded it twice to two different machines, one Linux one Windows, and the sum doesn't match what's listed in the KB. Each time I downloaded the file the checksum is the same, so I doubt there's a problem with the download. VMware please square this situation. It's annoying.

Update: apparently someone was listening, as after a few more repeated downloads, the MD5 sums now match. Weird...very weird.

Message was edited by:
stan-whitfield

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities