VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
Reply
0 Kudos
704 Replies
froaderick
Contributor
Contributor

Agreed.

Has there been any update as to which of the updates within U2 caused the problem?

Reply
0 Kudos
EDTRIANA
Contributor
Contributor

I understand but this is suppose to come from VMWARE and not from third

parties...

Systems Integration Engineer Rogers Wireless Inc Montreal, Canada
Reply
0 Kudos
jamieorth
Expert
Expert

This is from a Computerworld.com article -

""Updated product bits with correct licensing will be made available for

download as soon as possible," the spokesman said, adding in a follow-up

e-mail that VMware expects to have a patch available later today."

I hope the today part is correct.

Reply
0 Kudos
EDTRIANA
Contributor
Contributor

At 11:00 am in the morning there was nothing posted there..I can assure you that. I called VMware and a technical person just send me this link, can you believe it??

The fix was not even coming from VMware??? this is very very scary...

-


Original Message-----

From: Indrajit Mahanta (c)

Sent: Tuesday, August 12, 2008 10:49 AM

To: Eduardo Triana

Subject: Please refer to the link below for ESX server update 2

http://www.vdi.co.nz/?p=18






Systems Integration Engineer

Rogers Wireless Inc

Montreal, Canada

Systems Integration Engineer Rogers Wireless Inc Montreal, Canada
Reply
0 Kudos
RParker
Immortal
Immortal

its just a real shame that the "bug" has nothing to do with the functioning of ESX... it's all because of licensing and is why I am so critical of this particular error

Apprently you didn't read the posts either. It is an intrigal part of ESX, licensing or not, they ALL work TOGETHER. It doesn't matter which piece caused the problem, it's all part of the same CODE. It's not licensing, licensing is fine, it's within ESX itself. And how you can even say nothing to do with functioning..

you can't do DRS, vmotion, or power on VM's, that's pretty much ALL ESX does! It's 100% function.

If you change the date, and you have vm ware sync tools on, you have effectively made an entire environment useless. VM's can't login with AD unless time is within 5 minutes, some databases need time/date to be CRUCIAL to their updates.. this is a MAJOR hit, it's all about function, and nothing to do with licensing.

Reply
0 Kudos
EDTRIANA
Contributor
Contributor

May I ask you how many VMs do you have in production?? you don't sound like you have a big environment like we do...this is not good for us that today and tomorrow will have to be on site until this issue gets resolved..






Systems Integration Engineer

Rogers Wireless Inc

Montreal, Canada

Systems Integration Engineer Rogers Wireless Inc Montreal, Canada
Reply
0 Kudos
EDTRIANA
Contributor
Contributor

May I ask you how many VMs do you have in production?? you don't sound like you have a big environment like we do...this is not good for us that today and tomorrow will have to be on site until this issue gets resolved..






Systems Integration Engineer

Rogers Wireless Inc

Montreal, Canada

Systems Integration Engineer Rogers Wireless Inc Montreal, Canada
Reply
0 Kudos
alex555550
Enthusiast
Enthusiast

Lol, I dond`t want to sit on theire phone at the support desk Smiley Happy

Reply
0 Kudos
janradstaake
Contributor
Contributor

hmm,

KB site is down for maintenance for now.....

" Temporary Maintenance - Knowledge Base

The Knowledge Base is currently unavailable while we make important user improvements and upgrades to the site. We apologize for any inconvenience this may cause. "

What I find even worse: not mentioning anything about this mega-big issue on the home page www.vmware.com .... first sight only on the support page

Jan

Reply
0 Kudos
dispatch
Contributor
Contributor

Can anyone confirm this is limited to ESX 3.5 U2 and not U1? The KB article says it affects ESX 3.5.x....

I'm running ESX 3.5.0 82663 but my paranoia wants to confirm I'll be OK come tomorrow...

Thanks in advance.

Reply
0 Kudos
rob_nance
Contributor
Contributor

Please calm down everybody and let the vmware people do the needed work to

resolve this nasty issue. We all know that if the time pressure is high the quality of work dramatically degrade. This

is a severe problem without excuses, but no one is perfect in software development. If we look

in the past VMware has delivered very complex and high quality code. Let us conduct a technical based

discussion and not a flame war against them. We are all sitting in the same boat.

My 2 cent ...

Well, I am lucky, this only affected a monitoring VM. There are people with mission critical services down, so yes, it is a big deal, and people have every right to be upset. True, PEOPLE make mistakes, but that's why you should have whole teams of people looking at the code to make sure no one person misses this. VMware is not a person, they are a company providing a mission critical product, one that has a very terribly designed system for licensing to where if licensing fails, all shuts down. Hopefully the result of this will be VMWare fixing this so it doesn't break, just cripple, upon license failure.

Reply
0 Kudos
RParker
Immortal
Immortal

> When a vm is powered on it gets its time from the esx server. There is no other possibility. Even without vmtools.

That is not true. If you don't have the tools or have the sync option, the time syncs with Windows normal time sources. That's why it's an OPTION.

Reply
0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

I don't know...but the easiest thing for you to do - is shut down 1 of your test VM's and see if you can start it back up again. If you can then you're probably NOT affected.

Reply
0 Kudos
xprez
Contributor
Contributor

Did a major upgrade on all ESX hosts last week, typically..

But it's a production environment, and all we loose is DRS and HA.. We don't power down VM's on our production environment, and even if we did then turning back time on 1 host is no big deal as long as you have proper time keeping on your VM's.

So time bomb and what ever media is making this sound like, it has no affection on our servers handling billions today Smiley Wink

Reply
0 Kudos
dpomeroy
Champion
Champion

Eric,

I could not agree more. Over the last few years it seems VMware has become so focused on releasing new products and new features that QA has taken a big hit. I understand they are under constant pressure to stay two steps ahead of Microsoft, Xen, and others, but not at the expense of stability and reliability. One of the major things that helped ESX spread in the Enterprise data center was its reliability and stability, and they need to make it priority #1 if they expect us large companies to continue to put our critical applications on top of their virtualization layer.

Don Pomeroy

VMware Communities User Moderator

Reply
0 Kudos
Phil_White
Enthusiast
Enthusiast

Unfortunately all of my test vm's are shut off as I was conserving resources during my update the other night to 3.5u2 :smileyangry:

Reply
0 Kudos
dbuchanan
Contributor
Contributor

This is limited to 3.5 u2 only does not affect 3.5 u1

Dan L. Buchanan | Microsoft Engineer

Barclaycard | Business Technology Group | Infrastucture

Engineering

Telephone (302) 255-8970 | Mobile (302) 507-6297

COMPANY CONFIDENTIAL

Reply
0 Kudos
CLowe
Contributor
Contributor

I agree. Everyone take a deep breath...walk around the block once or twice. Take another look around and rationally evaluate your position.

I have to say I have never seen so many people got from 0 to full blown rage so fast. It makes me think that either 1) They were just looking for ANY excuse to start a rant 2) Are MS/XEN fans again looking for ANY excuse 3) Love starting flame wars from sheer voredom or 4) They are so perfect in their own lives that seeing anything less from anyone else is a cause for rage.

Is this a big problem? Absolutely. Is is the end of the world? Depends on how shaky your world already was. There was a mistake made (like any of us never made any) but a fix is on the way and will hopefully correct the issue. I am more concerned with how frequently problems like this occur which would show an inability to get get things right, than I am with the fact that one happened in the first place. In our environment we have about 70 vm's and this problem (providing it is corrected in a day or two) almost rates as a non-event for us.

Reply
0 Kudos
Phil_White
Enthusiast
Enthusiast

Actually, if I would use my brain Smiley Happy It's a good thing my test vm is off. I tried starting it and no go, crap Smiley Sad

Reply
0 Kudos
dispatch
Contributor
Contributor

Dan,

Thanks for the confirmation. Is there a definitive way to ascertain what version I am running? (Does 3.5.0, 82663 = U1?)

Reply
0 Kudos