VMware Cloud Community
fletch00
Enthusiast
Enthusiast

Solaris 10 64 bit VM bad_set_user_regs panic (and other Sol10 issues!)

Hi all, last night one of our Solaris 10 U4 (latest kernel 127112-07) 64 bit VMs paniced with:

genunix: http://ID 726438 kern.notice bad_set_user_regs: rp=fffffe80000adac0 rp->r_cs=ffffffff81baa000; how did that happen?

The only other reference I have found to this was a VMWare user's post to the SunManager list saying booting into 32 bit mode solved it for him.

has anyone else experienced this or the other issues I'm having with Solaris 10 in VM?

1) The VM will record SCSI timeout disconnects under high IO (this is a sendmail/spamassassin/mailman server) and basically hang until someone reboots it. (Have since reduced the IO by eliminating backup traffic and installing local DNS caching server) - the VM has crashed this way on netapp NFS datastore and recently on local 10K RPM local disk

2) When fully patched with Sun recommended patches (using PCA - "patch check advanced") the VM will not boot past "configuring devices" - its a brick (I had to revert to a pre-patched snapshot)

3) The VM (stock sol10 U4) will not boot at all on Dell 1950 dual Quad core E5440 (gets into reboot cycle / loop after the grub menu) - I had to selectively apply just the kernel patch to get it to boot - applying more patches caused issue #2

4) now these kernel panics - I've had a Sun GOLD support case opened for 3 days on this at sev1 critical/down and had no reponse from Sun (just escalated to mgr now) - I've had a VMWare gold case opened since 5 days ago and had no better progress.

I am seriously thinking of moving to Centos 5 VM for this server!

Any insight/feedback welcome

thanks

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
9 Replies
fletch00
Enthusiast
Enthusiast

Forgot to mention this is ESX 3.0.2 61618

(I am testing VC 2.5 and ESX 3.5 now)

thanks

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
fletch00
Enthusiast
Enthusiast

although somewhat dated - this is an interesting history of the MPT issues with solaris:

http://wotho.ethz.ch/ESX_solaris/Install_Solaris_on_ESX.html

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
mikepodoherty
Expert
Expert

Hmm, Your experience is the opposite of what I've experienced. We have Solaris 10 x86 64 support oracle 10g on IBM LS20 blades and with the exception of an issue with automatic failover for the SAN, they haven't hadd problems. We are currently installtion patch cluster for January and the January Oracle patches with no issues.

Getting support from SUN for virtual Solaris servers has been problematic and since we bought VMware from IBM, IBM has been the provider of support for VMware. Moreover, IBM has helped track down the issue with SAN failover - possibly because the SAN is an IBM DS4000 series.

I know I posted the fix for the automatic failover issue in the community.

I don't use the Sun Management Console - I prefer the KDE console and use Webmin if I need a graphic interface.

Can't help on the problems you decsribed but did want to answer you on what my experience has been.

Mike

0 Kudos
fletch00
Enthusiast
Enthusiast

I've isolated the patch that makes the VM no boot: 125082-14 SunOS 5.10_x86: mpt driver patch

I applied it to a test VM (after taking a pre-patch snapshot) and on reboot it hung at "configuring devices"

So I'll be looking for a compatible version of this patch from Sun

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
fletch00
Enthusiast
Enthusiast

I downloaded and tried all the available 125082 patches from the latest -14 down to -13, -10 and the last bootable one is -08 which I am having the SCSI mpt timeout crashes issues with.

Now what?

I can reproduce this at will - so I will focus the Sun support resources on this.

thanks

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
mikepodoherty
Expert
Expert

I looked at the documentation on the mpt problems. I'm using VMware 3.0.1 and haven't run into this issue. Which version of VMware are you using?

0 Kudos
fletch00
Enthusiast
Enthusiast

ESX 3.0.2 61618

To clarify - I tried all versions of the MPT driver patch:

125082-14: hangs on boot at "configuring devices"

125082-13: hangs on boot at "configuring devices"

125082-10: hangs on boot at "configuring devices"

125082-08: boots, but crashes with mpt SCSI timeout target disconnected messages

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
fletch00
Enthusiast
Enthusiast

I mentioned I was testing ESX 3.5 - today I tried applying the same "makes a brick VM under ESX 3.0.2" patch 125082-14 to the same VM booted on the new ESX 3.5 server. It booted without any issues...

So I am left to conclude that there is something in the new ESX 3.5 that makes it compatible with the latest MPT driver from Sun:

> /usr/sbin/modinfo | egrep mpt

33 fffffffffbb81ac0 35620 169 1 mpt (MPT HBA Driver v1.69)

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos
fletch00
Enthusiast
Enthusiast

I wanted to post a closing message for this thread.

I resolved the system contention on this Solaris VM -

Turns out the VMWare settings in the vmx file for this Solaris VM were not optimal:

> memsize = "2048" (old file)

> sched.mem.max = "256" (old file) - (If sched.mem.max is smaller than memsize,

> the balloon driver can start consuming memory (especially if the Guest

> Operating system application has peaky memory usage). However, this setting

> can cause the balloon driver to retain it's hold on memory continuously, even

> if the Guest Operating System requires it again. This causes Guest Operating

> System to start swapping and will slow down considerably.)

Now I recognize the vmware-memctld process consuming so much CPU was a red flag for this.

Once the two settings were brought into line (by using VC and checking Memory resources "unlimited") the VM functioned 100x better (responsiveness, workload throughput etc_

Thanks

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
0 Kudos