mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
bgallardo
Contributor
Contributor

Same problem in Spain at 8:00 PM LOCAL.

All my 3.5 in 10/08. WAITING FOR FIX !!!!!!!!!!!!!!!!!!!!!!!

Maintenance mode for fix? Reboot? Update manager? NO PLEASE !!! Production environment!!!!!!!!!

RPM for we will be the solution.

0 Kudos
sradnidge
Enthusiast
Enthusiast

Impeccable timing as well... it's Microsoft Patch Tuesday today isn't it? Good luck with not rebooting VM's over the next few days, a number of those patches are critical. Also just want to throw London into the mix as being affected, nice little global community in this thread Smiley Happy

0 Kudos
java_cat33
Virtuoso
Virtuoso

Yeah I had the same issue accessing the KB too - however finally managed to access it a few minutes ago..... you're not missing much.

0 Kudos
scott28tt
VMware Employee
VMware Employee

We have this issue in the UK too, I updated 1 x VC and 7 x ESX hosts yesterday to Update 2. The workaround has got things working for now, until we get the fix.

Scott.


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
0 Kudos
Erik_Zandboer
Expert
Expert

I am wondering about the KB workaround: 1) does it say to turn back time, but more important 2) does it tell you to put the time right again before your domain controllers die??

BTW: From the Netherlands, affected as well Smiley Sad

Visit my blog at http://www.vmdamentals.com
0 Kudos
admin
Immortal
Immortal

Dear Customers,

An issue has been uncovered with ESX/ESXi 3.5 Update 2 that causes the product license to expire on August 12. VMware engineering has isolated the root cause of this issue and will reissue the various upgrade media including the ESX 3.5 Update 2 ISO, ESXi 3.5 Update 2 ISO, ESX 3.5 Update 2 upgrade tar and zip files in the next 36 hours (by noon, August 13, PST). They will be available from the page: http://www.vmware.com/download/vi. Until then, we advise against upgrading to ESX/ESXi 3.5 Update 2.

The Update patch bundles will be released separately later in the week.

The issue is being tracked on KB 1006716 on

We sincerely apologize for any inconvenience that has been caused.

The VMware ESX Product Team

0 Kudos
KyawH
Enthusiast
Enthusiast

It's very disappointing!!! It affects our customers who lost $$$$ because of the down time.

I wouldn't be surprised if VMware's share price drop to below IPO price of 29$ very soon.

0 Kudos
Erik_Zandboer
Expert
Expert

No one is jumping for new ISOs and upgrade TARs; most importantly we are looking for a quick fix aka patch for the problem (and I don't mean putting the date of the hosts back)!!

Visit my blog at http://www.vmdamentals.com
0 Kudos
FrancWest
Contributor
Contributor

What a shame and still no word on a fix. You would think they'll be working on it around the clock.

Franc.

0 Kudos
LucD
Leadership
Leadership

Just spent 3 hours tracing the problem before I stumbled on the hostd.log and then on this thread.

Can't access the KB article either.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
LudoS
Contributor
Contributor

Hi,

Luckily only 2 out of our 18 servers (the two I recieved last week) are running 3.5u2... The rest of the infrastructure is on 3.5U1 (95350).

It seems that the bug is affecting only the destination server (i.e. if the target is 3.5U2) because I was able to move out the 3+4 VM I had on the 3.5U2 servers back to 3.5U1 servers using VMotion and put the 3.5U2 in maintenance mode and bring them out of the DRS/HA cluster.

I just prefer not to imagine what could have happen if I finished the upgrade as I planned and the 200+ VM's we have would have been affected :-(.

What really piss me off is the lack of communication from VMware directly as they have full lists of customers who registrated licenses and downloaded the binaries !!!! For cases such as this one I would have expected some kind of direct communication notifying those of us who luckily could benefit from the lessons learned from our colleagues in Australia who were hit first.

Thanks to all early publishers for sharing their experiences and keeping us informed.

BR Ludovic

0 Kudos
admin
Immortal
Immortal

Hi FrancWest,

Everyone is mobilized here at VMware. mjlin, who posted in this thread several hours ago, is the product manager. Support knows what is going on. Someone else has posted our first communication here on this thread (patch should be available within 36 hours). Unfortunately I also can't access the kb, but I assume that posted message is from the kb.

I know we're preparing additional communication, so check that kb and expect more from us as we have more information. I'm sorry we weren't able to reach out to everyone directly yet.

John

0 Kudos
deploylinux
Enthusiast
Enthusiast

Not to be argumentitive, but 36hrs is way unreasonable to make customers wait. ESX is supposed to be an enterprise product. Enteprise products usually have 4hr SLA's. No one expects vmware to fix, recompile, and distribute ESX patches in under 4hrs...but there is a huge gap between 4hrs and 36.

0 Kudos
joergriether
Hot Shot
Hot Shot

There must be a licensing backdoor for emergencies to simply allow everything, NOW would be the right time to give that information out to the this community. Another thing: Earlier you mentioned you´ll provide iso´s and zip´s within 36h but not a patch for fixing EXISTING machines in that timeframe. Also, you didn´t mention at all how you gonna accomplish this. I guess a reboot or maintenance mode is not an option for many users here. This is production environment, this is critical environment, I suggest you take critical measures.

Joerg

0 Kudos
Lars_Wolff
Contributor
Contributor

I hope apllying the future Patch will not require any reboot! If I am not able to use Vmotion I have to apply the Patch in the Night....

Cheers

Lars

0 Kudos
ESXDevil
Enthusiast
Enthusiast

Hi all

I have the same issue here in Switzerland. Changing the Date solved the problem. Shame on you VMWare! I've thought that i have done a missconfiguration on the ESX Hosts and several hours i searched for a solution to fix this superb error message " a general system error has occurred: Internal error".

Thanks to the others here in this communities that helped me a lot.

0 Kudos
ezhosting
Contributor
Contributor

Yes me too.

I think you should prioritize a PATCH for those stuck in this hell, instead of creating new isos.

0 Kudos
mimo17
Contributor
Contributor

Lucky you if you can reboot in the night. We have 24h operation!

I asked the support if a trial would work as a workaround - but poor support girl did not understand.

Any sugesstions from the forum?

0 Kudos
hughs
Contributor
Contributor

I can't believe this has happened. I'm trialling ESXi before we spend £50k on Virtualising all of our servers. I'm hardly going to go down the VMware route now am I? Idiots.

0 Kudos
Bakafish_com
Contributor
Contributor

As a former VMware engineer I know a bit about the build process, and unfortunately this being tied into the licensing code means that any fix likely touches a lot of different places in the code-base itself. There probably isn't going to be a simple patch, but something along the lines of a 3.5u2v2 bundle with all new components. This is one of the reasons why licensing that hobbles applications sucks, I wish there was a better way for companies to protect their interests without leaving their customers (and therefore themselves) at so much risk. The awful thing is that this bug wasn't reflected correctly in the UI, causing us to really dig to find the answer. I found this thread after about 3 hours of searching through logs and googling, when there was only a few posts on this thread. Let's hope the updated bundles are available soon. Expect Maintenance Mode/Reboot.

Representing Japan.

0 Kudos