VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
mcowger
Immortal
Immortal

Totally agree with this.

We use MIT Kerberos extensivly, and only sync time with NTP for exactly this reason.

--m

--Matt VCDX #52 blog.cowger.us
0 Kudos
Trystam
Enthusiast
Enthusiast

Thanks for that update Ron.

Francisco Cardoso, Logica PT - VCP
0 Kudos
williamarrata
Expert
Expert

I have just now called VMware and as of now there is still no fix or patch to this bug. There only stated fix so far is to uncheck the NTP box and change the date on the hosts to before the 10th of this month.

Hope that helped. Smiley Happy

Hope that helped. 🙂
0 Kudos
Trystam
Enthusiast
Enthusiast

William,

The "available" and "doable" procedures to minimize this situation have been already stated previously.

You dont need to do exactly that.

Francisco Cardoso, Logica PT - VCP
0 Kudos
edawg
Enthusiast
Enthusiast

Stupid question. I have ESX 3.5 installed, but don't know what my Update version is?? Can you advise easiest way to tell?

Thank you,

Erik

0 Kudos
halemeister
Contributor
Contributor

build number 103908 is Update 2

0 Kudos
KlinikenLB
Contributor
Contributor

@halemeister:

We were also thinking about how this express patch can be applied.

I think: back to the roots, directly on the ESX console, without any help of the update manager. And it must be a solution that does not need any reboot ...

So let's wait for what VMware will bring out.

0 Kudos
Trystam
Enthusiast
Enthusiast

There are no stupid questions:

On Virtual Center check the host :

Build 3.5.0, 826663 Update 1 Vanilla

Build 3.5.0, 103908 Update 2 Vanilla

Francisco Cardoso, Logica PT - VCP
0 Kudos
COS
Expert
Expert

Connect through VIC and yo should see on the top "ESX Server, 3.5.0, 103908" if you see that, You're on U2. anything less than 103908 your not on U2.

0 Kudos
Trystam
Enthusiast
Enthusiast

I dont think we are so fortunate,

It would depend greatly where the "timebomb" feature was devised, but by the looks of it i'd say on vmkernel itself.

That being the case even if it´s applied on the Service Console, i really dont think that no downtime is going to be an option.

Still we can hope

Francisco Cardoso, Logica PT - VCP
0 Kudos
johnpearson
Contributor
Contributor

Ok, now this really gets me... It's being reported on Network World that this bug was known beforehand:

"We were proactively notified by VMware Monday. They told us these are the symptoms and what would happen if you powered down your virtual machines," he explains. "They gave us the general prescription in terms of troubleshooting and things to avoid such as powering down."

So what gives here? If this is true, VMware really dropped the ball by not publishing this fact far and wide as soon as it was known. As a co-worker said, I smell a rat.

JP

0 Kudos
Trystam
Enthusiast
Enthusiast

I also find it doubtfull that with that information known beforehand only now that bit of news was issued so i really think that there may be a rat ... but it´s not on VMware side ... just someone capitalizing on this issue.

This is too much bad publicity for vmware, i´ll be ignoring those sorts of comments and give them the proper respect that they deserve. very little to none

Francisco Cardoso, Logica PT - VCP
0 Kudos
myates
Contributor
Contributor

Works on mine!

ESX 3.5 update 2 burnt with Imgburn 2.4.2.0 safedisc 2 profile at 1x speed.

3 x ESX servers and Fiber SAN storage array.

0 Kudos
Trystam
Enthusiast
Enthusiast

If you dont mind me saying what the hell are you talking about ...

This is hardly a discussion about the best tool to burn an iso.

Francisco Cardoso, Logica PT - VCP
0 Kudos
COS
Expert
Expert

Huh!?!?

I've never "Burnt" anything using "safedisc" at 1x speed on a Linux ESX Build Server.

0 Kudos
timw18
Enthusiast
Enthusiast

Lucky for some

0 Kudos
COS
Expert
Expert

It's saying it burned it's entire Fibre SAN Array using "safedisc" of ES U2....

Don't you get it?

0 Kudos
larstr
Champion
Champion

Ok, now this really gets me... It's being reported on Network World http://www.networkworld.com/news/2008/081208-vmware-bug.html?t51hb that this bug was known beforehand:

This is probably due to time zone differences. For example .nz & .au was hit by the issue while it was still Monday in the US.

Lars

0 Kudos
edawg
Enthusiast
Enthusiast

Thanks Trystam-

Mine states ESX 3.5, 98103....Does that mean anything below 103908 is not Update 2? What I also find interesting is yesterday I used Update manager to apply critical updates to my host and it stated no critical updates were necessary.

erik

<http://www.recycleworks.org/>

>>> Trystam <communities-emailer@vmware.com> 8/12/2008 4:21 PM >>>

,

A new message was posted in the thread "BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error":

http://communities.vmware.com/message/1021222

Author : Trystam

Email : francisco.cardoso@gmail.com

Profile : http://communities.vmware.com/people/Trystam

Message:

0 Kudos
Trystam
Enthusiast
Enthusiast

Honest to god i really dont see his point .... but hey im Portuguese .. my english can be playing tricks on me ...

" ESX 3.5 update 2 burnt with Imgburn 2.4.2.0 safedisc 2 profile at 1x speed. "&lt;- This for me means that he burned a CD with Imageburn ..... i dont see the relevance of this information ... is it relevant ?

The disk could be bloody burnt with brasero or cdrao ....

Francisco Cardoso, Logica PT - VCP
0 Kudos