VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
interix
Enthusiast
Enthusiast

Ill have to agree with CLowe VMware has been stable (been working with it since 1.x) create an update policy and stick to it, i will agree with anyone who says no way to prepape for this type of issue... use a work around and be patient. Issue was identified and they will resolve it.

Farm 100+hosts with 1 effected.

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Sorry to hear that Micro...but at least it's a test VM. I just talked to a friend who works for a group of lawyer's offices. He's not really affected because all his VM's are up and working.

Everyone's just hoping that there's no "incidents" between now and the time the permanent fix comes out.

0 Kudos
gi-minni
Contributor
Contributor

We have dozens of clustered ESX servers in production with hundreds of VM boxes on it. All our stuff is alerted, because

we have mission critical applications, but you can not change anything s*it happens and no one is perfect.

I know that vmware and we all together will get our lessons learned on this sad day.

Of course our production ESX serverfarms are not updated frequently nor automatically and we are in a good shape,

but shouting will not change anything it will get things worser. Now is better to have a cool and thinking head than a

red and angry one.

0 Kudos
bluedrake
Contributor
Contributor

Well just an update we reinstalled one of our production esx servers with 3.5 update 1 and could vmotion servers to that esx server, so now going to reinstall other 3 esx servers and save not having to worry about downtime

0 Kudos
BenConrad
Expert
Expert

That paragraph is certainly not clear as it references that U1 is the same as U2. "ESX350-Update01 (ESX350-Update02)" is to be interpreted as that they are the same or similar.

Anyway, It looks like vmware has pulled all patches and U2 release in July.

Ben

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Thanks for the calm & logical comments! Good change from the flames / rants.

That's a good idea actually - just reinstall...only takes 30 minutes to get an ESX host up and running. Smiley Happy

0 Kudos
RParker
Immortal
Immortal

> can I upgrade my Virtual Center server to 2.5 U2?

YES.

> It seems this bug only affects ESX itself.

That is correct. You can manage the ESX U1, U2 and other ESX versions, but only the ESX U2 has this problem. VC will be fine.

0 Kudos
JoeCasanova
Contributor
Contributor

Wow, 24 pages!

ESXi 3.5 U2 here.

I was working late last night in the middle of a transition from MS Virtual Server 2005 to VMWare ESXi 3.5 and caught the bug an hour or so shy of midnight. I came here and found the answer. Rolled back to August 10th and all is OK. Bookmarked this page and expected an answer by VMWare by now, but I didn't expect to see it become 24 pages long!

I'm eagerly awaiting a permanent solution so I can continue my work without worry!

0 Kudos
JPerf
Contributor
Contributor

I think that in the future I will never to install no release with marking "U2"

only U1 and then U3, U4,...

BTW

U2 (aeroplane) on the May 01 1960, at that time had the same problem...well I never ?

JPerf

0 Kudos
RickPollock
Enthusiast
Enthusiast

Yes, I am very disappointed in this bug. How did this get past QA? It's a good thing I haven't deployed this in production yet. Looks like its time to spin up Hyper-V....

I would be lucky to have my job if this happened in production. OMG ... VMware get it together. First the bogus download and now this? It almost can't get any worse!

0 Kudos
BryanMcC
Expert
Expert

Just goes to show you... Like I always say... Unless there is a specific bug being fixed you should always wait before applying updates.. And even then testing/validation/acceptance????

And tisk tisk to you all who applied this to all of your production ESX hosts.

If anyone has been installing the latest greatest from VMware you may have known that something like this could happen.. My VC 2.5 and ESX3.5 install was buggy as all hell.. Finally I applied update 1 a few months ago and lost many of the annoying bugs.

At least shares are holding steady.. Smiley Happy

Just figured I would through in my 2.

Help me help you by scoring points.

Help me help you by scoring points.
0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

If you shut down all your production mission critical VM's (i.e. Exchange DC's etc) the previous night then you'll definitely be in trouble. Otherwise - you're not in bad shape as your VM's that were powered on will continue to work.

0 Kudos
jpratt_at_norwi
Enthusiast
Enthusiast

Wow. normally we are "bleeding-edge" but have been to busy to upgrade from u1 to u2..

So.... just to let it be known....

HEY VMWARE - YOU JUST LOST ANOTHER "BLEEDING-EDGE" SHOP. Maybe you will STOP including experimental code in production-release builds now?... PLEASE??

regards,

j

0 Kudos
RParker
Immortal
Immortal

But it's a production environment, and all we loose is DRS and HA.. We don't power down VM's on our production environment, and even if we didthen turning back time on 1 host is no big deal as long as you haveproper time keeping on your VM's.

First did you read the posts? I don't think you did. This affects EVERY ESX 3.5 U2 server in the world.

Secondly, proper time keeping is PRECISELY what is preventing the VM's from powering on. The fact that the servers ARE up to date is why the VM's can't power on. The DATE has a drop dead time/date of 8/12/08 (12:00 AM). So ALL VM's after that time will NOT be able to turn on. That's the issue.

Third you lose the ability to load balance via vmotion, that could cause serious issues with ESX servers also.

Fourth the work around (which is FAR from ideal) is to change the date on the ESX server. The problem is if you have the VM's set to time sync, and the VM's start, the bios will have a date/time in the past. So you have to manually update each and every VM to ensure they are not going to synch with the server.

This is a HUGE bug and not one to be simply dismissed by your posting.. It's a LOT more serious than you evidently understand.. So Media or whatever, is going to have a field day with this, BECAUSE it DOES.. Affect servers handling billions today...

So READ the details next time before you ASSUME you understand what's at stake. Maybe you don't power off your VM's, but the rest of the world doesn't share your opinion...

0 Kudos
Speedbmp
Enthusiast
Enthusiast

0 Kudos
rlabhart
Contributor
Contributor

Dispatch,

The version number you have is not affected. The affected versions should be:

ESX3.5, Build 103908

ESX3.5i, Build 103909

Your build, 82663, is ESX3.5 (U1) with patches up to April 10th.

VMWare updates the build # based on your installed patch level. (Your safely? well behind in patches....)

0 Kudos
Phil_White
Enthusiast
Enthusiast

Someone needs to mention that you shouldn't change the clocks on an ESX server that hosts a domain controller...not every one(unfortunately) is aware of the consequences.

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Yes good thing in our environment - all Windows boxes sync to DC's (VM tools sync is disabled). Also got a NTP server running to point to the outside world...we don't rely on VMware ESX hosts for NTP. Smiley Happy

It's unfortunately for those who did though cause they can't use the NTP time fix to power on their VM's easily. Smiley Sad

0 Kudos
Phil_White
Enthusiast
Enthusiast

What's weird for my server farm is that when vmotioning a VM, for some reason it would pick up the ESX server time even with tools sync disabled. We found out the hard way that ESX will force the VM to grab the ESX time as a "feauture". We couldn't figure it out how to to make it NOT do that so we have to keep the ESX server time synced with an NTP.

0 Kudos
kstailey
Contributor
Contributor

Is there anywhere we can register to get a call back or E-mail indication when this issue becomes resolved?

It would beat polling news forums to find out.

-


So funny when ESX 3i became free there was so much less news than there is about this today.

0 Kudos