VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
Michelle_Laveri
Virtuoso
Virtuoso

I'm adding to myself to this thread mainly so I get updates as to when it is resolved - I've had about 5 emails from various people already... mainly asking me to blog about this issue to get the word out.... Which I have done this morning.... I would have to say chaning the date/time on the ESX host doesn't seem much of a work around - what about clock drift and the time on all the virtual machines...

Regards

Mike Laverick

Regards
Michelle Laverick
@m_laverick
http://www.michellelaverick.com
0 Kudos
LB1
Contributor
Contributor

I am still concerned about the avilability of this patch.

Currently the servers handling the kb.vmware.com area, is basically so slammed, that it's undergoing a kind of Denial of Service. (it's basically unavailable).

What will happen with the update and patch servers?

When it's available, will these servers be trampled too?

I hope someone has set a few engineers off on a task to beef up the mirror server numbers.

0 Kudos
krival96
Contributor
Contributor

you are right Bakafish.com. they have to test it properly before they release any patches.

kris http://www.vdi.co.nz
0 Kudos
LeoKurz2
Enthusiast
Enthusiast

Anyone an idea / procedure how to roll back to U1 (apart from throwing in a CD an do a reinstall)? Will remove certain RPMs work??

__Leo

0 Kudos
APN_NZ
Contributor
Contributor

We in New Zealand were first to experience this date/lic issue.

Setting the date back does not effect the time on the virtual machines, as long as the vmtools are not set to sync with the ESX Host.Our VM's sync time from the domain and not the ESX Host so they were unaffected. Do take care when changing the date/time.

I would suggest doing the date change via the service console and not through the gui. Once all your vm's have booted set the time back.

Good Luck

/RR

0 Kudos
smithers0105
Contributor
Contributor

Just to add to the list...

We coincidentally performed our first upgrade from 3.0.2 to 3.5u2 this morning (Aug 12th, 10am UK time) and I noticed straight away while the upgraded host was still in maintenance mode that the licensing tab within VI client gave the general system error msg. Fortunately, this was only in our test cluster, but obviously, I will leave that host in maintenance until the fix appears.

Hope you guys with production hosts running on update 2 get things sorted quickly.

0 Kudos
hhedeshian
Contributor
Contributor

Aug 12. 4:09AM MDT can confirm the problem here as well. Also, cannot access the KB article "We're sorry, but this Document is not currently available". Fortunately I can revert to trial mode with 40+ days left...

0 Kudos
LudoS
Contributor
Contributor

Hi again,

In my previous post I wasn't clear enough with the workaround I proposed:

>If you want to be proactive, I suggest you find the ESX server hosting the less domageable VM's you have (maybe one hosting only dev servers, or the >one with the least VM's or the one where customers are the most tolerant) and stop all VM's on it (an orderly shutdown is always better than a crash), >restart them elsewhere on 3.5U2 and downgrade the empty box to 3.5U1. You can then use VMotion to move VM's from another ESX 3.5U2 server (it is >working, i tested it) and perform a "rolling downgrade" of your infrastructure. Re-installing ESX should not take more than one hour and you will be in a >much better situation when VMware release the fix, regardless of the form it will have, ISO, RPM ...

When I wrote "restart them elsewhere on 3.5U2" I meant "restart them elsewhere on 3.5U2 where you applied the workaround, i.e moving back the date" if you really can't avoid it.

I strongly suggest that anyone not wanting to wait 36 hours or more for a fix apply the workaround I described because time keeping is really a critical problem for a number of applications (as of now i can think of database, SAP even Kerberos....). A "not so planned" orderly shutdown is far better than data inconsistencies in your environment.

P.S. you can leverge this opportunity to apply Microsoft patches on your VM's (or Red-hat, or...) :0 :0

Hope this helps, best regards,

Ludovic

0 Kudos
KyawH
Enthusiast
Enthusiast

Mike,

I agree with you. Changing the clock is NOT a workaround. Work around is an alternate solution, not temporary solution such as this: buying time solution...There are so many OSes/applications out there relying on the time...who knows what will get impacted by this changing time therapy and most importantly, how long it will last?

0 Kudos
tomsommer
Contributor
Contributor

LB@SGI:

Maybe their Knowledge Base is hosted on a VM box that suddenly won't boot!

0 Kudos
java_cat33
Virtuoso
Virtuoso

Just managed to access the KB again.... here is what it currently display's for those who can't access it.....

0 Kudos
totgate
Contributor
Contributor

Well, as a matter of fact, we have a big Hyper-v installation at a customers factory and its running like clockwork. So maybe, just maybe if VmWare doesnt fix this really soon you could end up being quite lonely in here. 36 hrs is not acceptable for a fix when a problem of this magnitude arise. So please dont tell me to be calm while we and our customers are loosing thousand of Skr every hour just because they are more interested in licenseprotection stuff than to offer their customers software that will work and is properly tested. The only ones that are having problems with all the "copyprotection" stuff are "we", the ones that are paying for our software. Does that seems right to you. I really think that VmWare is really showing a total lack of respect for their customers when they are "fixing" things like this. I dont think their stocks will be wery high in the nearest future.

/Tobbe

0 Kudos
skemp
VMware Employee
VMware Employee

Unfortunately, there is no rollback option for updates.

0 Kudos
winetou
Contributor
Contributor

I have the same problem. Changing date solved it, but i hope it's only temporary solution. Still waiting for patch - the one without restart / maintenance mode....

Regards from Poland

0 Kudos
maishsk
Expert
Expert

I agree with your, however, I did not Update to this "update 2" level manually.

Apparently, major update releases are put in place, automatically by Update manager!

I was unaware that Update Manager would do this, i thought it only put out interim patches, and not major OS updates!

We have (now HAD) Update Manager running the updates monthly.

Apparently this got put in the update sequence for installation and I didn't see it.

I have Change Control documents in the pipeline for U2, and had expected to install in phases.

(the right way)

But now, i'm apparently all updated. (Except for VirtualCenter and VCB).

Out of all of this whole thread - to me - this is the most worrying issue.

Is there no way to tell Update manager that updates have to be authorized?

Now this - for me - is unacceptable...


Maish

Systems Administrator & Virtualization Architect

Maish Saidel-Keesing • @maishsk • http://technodrone.blogspot.com • VMTN Moderator • vExpert • Co-author of VMware vSphere Design
0 Kudos
wallbreaker
Contributor
Contributor

Changing date solved it too. But statistics in Infrastruture Client are down.

Regards from France.

0 Kudos
KyawH
Enthusiast
Enthusiast

UNACCEPTABLE-

How about this? Open your eyes and read carefully-

Problem:

product expired on 12/08/2008 is in the code somewhere.

Solution:

Find/search through the freaking codes for the freaking date and change it to bloody 2020 or something and release it now. Don't change any other freaking things in the code!!!

Then you will have more freaking time to find and fix whatever it broke other than the freaking date.

Get the customer up and running now. That should be your goal.

You have to recognize that YOU ARE IN A WAR!

TIME IS CRICITAL!!!

0 Kudos
totgate
Contributor
Contributor

Exactly my opinion....:x

/Tobbe

0 Kudos
ricdavis
Contributor
Contributor

It seems to occur if you boot a VM with the tools synch off. I've not seen the problem with runnning machines, and have just finished migrating guests from tools time synch to NTP. Thankfully, you can reboot the OS inside a runnning VM with the host date/time set correctly, you just can't start a VM from scratch.

0 Kudos
LB1
Contributor
Contributor

I am still researching where the U2 portion of the updates entered Update Manager.

Perhaps they were only in the non-critical areas, and are "filterable" if I only apply CRITICAL patches.

(I STILL hate that this was anywhere in Update Manager.)

I LIKE that UM can apply this upgrade, but I would have liked to see it in a 3rd update type.

Something other then CRITICAL and NON-CRITICAL, for example, a "release update" level or something like that.

Something that I would have left in an "unchecked" state in Update Manager.

0 Kudos