VMware Cloud Community
jasper9890
Enthusiast
Enthusiast

Solaris 10 unstable in ESX?

Has anyone had any problems with Solaris 10 machines being pretty unstable and rebooting /w a core dump periodically? I've had problems periodically over the past year with this, but recently it's been getting worse, with two nights ago 5 did it at about the same time. VMware support found nothing in the console logs, there is nothing in the system logs, no san events, no switch problems, and so on.. The ones it's happening the most on are Mysql servers with pretty decent amount of I/O so i'm wonderinf if that can be part of it.

Anyone else dealing with this?

Reply
0 Kudos
7 Replies
fletch00
Enthusiast
Enthusiast

Yes, I migrated a previously stable Solaris 10 VM (a Mail/Spamassassin/Mailman server) with a lot of I/O from an AMD ESX 3.0.2 to a Dell 1950 Quad Core Xeon E5440.

I applied the latest Sun Kernel patch (127112-07) prior to migration (it would not actually boot on the Xeon without it)

It was up for 3 hours then spontaneously went off the air (not pingable).

There were no logs on the VM, ESX, or Netapp (it runs off NFS) to indicate what the issue was that brought the VM down...other VMs on the same ESX host including a Solaris 10 VM (same patch rev but next to zero I/O) were fine...

thanks for any insight - I've opened a support case on this

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos
fletch00
Enthusiast
Enthusiast

This sounds relevant:

Large Network Transfers Can Cause a Solaris 10 Virtual Machine's Network Connection to Stop Responding

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2176

except the sun bug (6302632) was "fixed" in Dec 2006

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos
fletch00
Enthusiast
Enthusiast

Happened again at 02:32 am - the culprit I found on the VM console:

"SCSI Timeout disconnected..."

Its correlated to our Legato backups generating IO on the NetApp central storage.

This VM seems to be effected worse than the others since its generating a lot of IO itself all the time.

VMWare supplied a Linux patch when this happened to 3 Linux hosts last week - I am looking for the Solaris patch to do the same...

thanks

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos
thomasbaca
Contributor
Contributor

Any news on this front? We're also getting SCSI timeouts on ESX 3.0.2 w/ Solaris 10 guests.

thanks,

-tom

Reply
0 Kudos
fletch00
Enthusiast
Enthusiast

the solaris 10 patches were able to be applied under ESX 3.5 (the SCSI driver patch rev was incompatible with 3.0.2, but OK on 3.5)

And the performance issue was resolve by checking resource memory unlimited in VC so that memsize and max.mem were matching in the .vmx file...

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos
thomasbaca
Contributor
Contributor

Thanks for the response. Which particular patches you were dealing with?

-tom

Reply
0 Kudos
fletch00
Enthusiast
Enthusiast

As we progressed on the VMWare Solaris MPT compatibilty path we

discovered while the latest 125082-14 will not boot on ESX 3.0.2, it will

boot fine on ESX 3.5.

See my VMWare forums thread:

http://communities.vmware.com/message/865022#865022

So I am left to conclude that there is something in the new ESX 3.5 that

makes it compatible with the latest MPT driver from Sun

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info
Reply
0 Kudos