VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
0 Kudos
704 Replies
abrjgl
Contributor
Contributor

jamieorth and Kevin Gao - As I tried to explain: Of course I configure Update Manager but as I tried to explain I specifically did not tell Update Manager to install ESX350-200806201-UG (part of Update 2). I did however tell Update Manager to install ESX350-200804404-BG and ESX350-200804405-BG. Both of these updates will trigger the install of ESX350-200806201-UG and they should not do that.

0 Kudos
THP
Contributor
Contributor

I'll second this - we were patching (intendeing to go to U1) on the day U2 was released.

Initially before scan for updates we saw U2 mentioned - after the scan it had gone (at the time we thought this was perhaps that it had been pulled). However when we remediated none of the listed patches were called 'Upgrade 2' so we thought we were safe to continue and did so.

On checking version we found we had done Upgrade 2 as well. As it is our dev area we didn't panic but were a little annoyed at the lack of clarity in the tool. Obviously yesterday we became a little more cheesed off about it.

And to allk thjose saying M$ do worse - yes they have done but at least when I am installing a service pack to an OS it clearly lists itself as a service pack so I can understand the risk I am taking on!!

I hope one lesson learned from this is that VM make their patch labelling much clearer.

Sure the 'perfect & smug people' can heap scorn on those of us who were misled but it seems to have hit more than just the novices so perhaps worth addressing?

0 Kudos
leeroyrichardso
Contributor
Contributor

I cannot open my vcentre to run the auto patch, so I have downloaded the patch to a temp file on the affected esx boxes, do I need to take the servers offline before running the patch??

Lee Richardson

Lee Richardson
0 Kudos
Trystam
Enthusiast
Enthusiast

by using the esxupdate command you need to have get the host in maintenance mode:

eg:

# esxupdate update

INFO: No repository URL specified, going with file:///var/updates/ESX350-200806812-BG

INFO: Configuring...

ERROR: This bundle requires the host to be in maintenance mode. Since the host is not in maintenance mode, esxupdate cannot proceed. The VMs need to be turned off or migrated to another host first.

#

Francisco Cardoso, Logica PT - VCP
0 Kudos
APN_NZ
Contributor
Contributor

vihostupdate is a remote CLI you use to scan and update ESX Server 3i you could try it with esx3.5 but if esxupdate is not on two of your systems sounds like you have bigger issues!

0 Kudos
leeroyrichardso
Contributor
Contributor

I am logged onto my esx server ui because my vcenter is down, but I cannot see any options to migrate my current vms???

If I changed the dates on my esx servers would my vcentre come back online???

Lee Richardson

Lee Richardson
0 Kudos
abrjgl
Contributor
Contributor

mjlin - have you upgraded to Virtual Center 2.5 Update 2? VC 2.5 Update 1 can have this problem if you have ESX 3.5 update 2 hosts.

0 Kudos
Trystam
Enthusiast
Enthusiast

Lee,

This issue doesnt affect the Virtual Center, so you must be experiencing some other problem too.

You cannot VMotion when you connect your Virtual Client directly to the ESX Box since you are only seeying part of your infrastructure, VMotion is a Virtual Center based/controlled feature.

Francisco Cardoso, Logica PT - VCP
0 Kudos
Trystam
Enthusiast
Enthusiast

My Virtual Center is U1 and i have no record of that problem in my infrastructure nor i believe that behaviour being reported.

Francisco Cardoso, Logica PT - VCP
0 Kudos
abrjgl
Contributor
Contributor

I reported this problem in this thread http://communities.vmware.com/message/1019825#1019825

0 Kudos
Bastian_Haas
Contributor
Contributor

To whom it concerns at VMware: Could you please change this useless error message "A general system error occurred: Internal Error" in a future VIC update? With this message no one knows what the problem is, regardless whether it's a real licensing error or a timebomb Smiley Wink

Thank you.

0 Kudos
j-swift
Contributor
Contributor

I think some people have been caught out because they have run their VC as a VM and thus, once powered off, it won't now power on if that system has update 2.

Bad idea to have VC/infrastructure only as a VM - you need to buy a physical box to run VC (& licensing). In fairness to VMware they did recommend against running it as a VM.

I'm expecting some Architects will be tweaking their design and recommendations based on what has happened. Bl**dy good job that it happened now, when virtualisation is growing, but still a relatively small part of most data centres (I work across two DCs with around 5500 devices, around 5000 of which are servers). Just imagine what 'could' have happened if most servers were virtualised - in the next 10-15 years this could be the case.

Like most shareware, and trial software has done for decades, a small piece of code providing a warning about expiration, which logs a message to VC, syslog, whatever, as you approach that date, but to disable it instantly, without warning, is where VMware had shot themselves in the foot. Luckily, we can all learn from this and make more informed decisions about the risks of licensing models (which sort to use), and also about business risks of allowing guests to timesync to a host, implications for moving dates backwards on a host, and making sure VC is on a separate physical machine.

Jonathan.

0 Kudos
leeroyrichardso
Contributor
Contributor

I am running both the VC and licensing on a physical box and no matter what I do I cannot get the VC to start. We will be unable to migrate the servers and complete the upgrade process without the VC running.

Lee Richardson

Lee Richardson
0 Kudos
hughs
Contributor
Contributor

I don't know if anyone can help but I'm still having problems. I downloaded the patch file (ESXe350-200807812-O-BG.zip) and used Infrastructure Update to apply it to a server running 3.5u2 ESXi. It said it had finished ok, so I reset the time to today and.... nothing. I'm still getting the same error. Now if I try to add the package I get an error saying 'Package does not contain applicable updates'

Help!

Edit: It works. I've rebooted the host PC (wow, it's just like using Windows) and I can now boot my VM's. Good luck to all those with many many computers that need rebooting.....

0 Kudos
ICT-Freak
Enthusiast
Enthusiast

Did you restart the ESX3i host?

0 Kudos
hughs
Contributor
Contributor

I did, but where did it tell me to do this? The VMware info is really poor. The email refered to the page with the patches, which is great. But that page then says look at the KB for 'deployment consieration and something'. Fine. But that then tells me to download and use three different .pdf's. Yawn. How about "Download this huge file. Use the Infrastructure Update software to apply it. Reboot." (Yes I know it's all more complicated than this).

0 Kudos
vmwaredimetroni
Contributor
Contributor

excellent!!!!

I´ve updated 3/5 of my esx servers succesfully throught this steps!!

The other two i have problems with metadata updates and i can not resolve at moment, also this esx servers with the same version that another, not including esxupdate command.

Thanks!!

0 Kudos
cgb
Contributor
Contributor

To whom it concerns at VMware: Could you please change this useless error message

"A general system error occurred: Internal Error" in a future VIC update? With this message

no one knows what the problem is, regardless whether it's a real licensing error or a timebomb Smiley Wink

Very good request - nothing annoys me more than non-intuitive error messages. Even when debugging

in the logs for this problem, the message logged was 'Product expired. Be sure that your host machine's

date and time are set correctly. '. I mean, it was accurate given the cause, but totally unexpected when

I'm building up a validly licensed new server for a customer..

Even more annoying, was the suggestion in hostd.log:

There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

and then to go to that URL and all the while VMWare knew about the problem and didn't update that URL

content with a useful link to the knowledge base article (while it was up) and/or to the forum post where it all

began. Even now, the content is still not changed to point people to the issue.

0 Kudos
cgb
Contributor
Contributor

I did, but where did it tell me to do this? The VMware info is really poor. The email refered to the page with the

patches, which is great. But that page then says look at the KB for 'deployment consieration and something'. Fine.

But that then tells me to download and use three different .pdf's. Yawn. How about "Download this huge file. Use the

Infrastructure Update software to apply it. Reboot." (Yes I know it's all more complicated than this).

I felt the same way - even though I'm familiar with using esxupdate to pull down updates from a locally built repository, I think given the impact this problem has had, VMWare could have provided a very clear step-by-step procedure for performing the update so those who needed it resolved urgently could get it applied asap, rather than have to go read VMWare Update Manager documentation or Patch Management Guide.

0 Kudos
Trystam
Enthusiast
Enthusiast

Basicly:

You download the ESX350-200806812-BG.zip file up upload it via SFTP to your ESX box onto a directory, in my case /var/updates.

Put the ESX system in maintenance mode.

then:

# cd /var/updates/

# ls -ltr

total 106720

-rw-rr 1 root root 109165839 Aug 13 14:48 ESX350-200806812-BG.zip

# unzip ESX350-200806812-BG.zip

Archive: ESX350-200806812-BG.zip

creating: ESX350-200806812-BG/

inflating: ESX350-200806812-BG/VMware-esx-vmx-3.5.0-110181.i386.rpm

inflating: ESX350-200806812-BG/VMware-hostd-esx-3.5.0-110181.i386.rpm

inflating: ESX350-200806812-BG/descriptor.xml

creating: ESX350-200806812-BG/headers/

extracting: ESX350-200806812-BG/headers/VMware-esx-vmx-0-3.5.0-110181.i386.hdr

extracting: ESX350-200806812-BG/headers/VMware-hostd-esx-0-3.5.0-110181.i386.hdr

inflating: ESX350-200806812-BG/headers/header.info

inflating: ESX350-200806812-BG/headers/contents.xml

inflating: ESX350-200806812-BG/headers/contents.xml.sig

inflating: ESX350-200806812-BG/contents.xml

inflating: ESX350-200806812-BG/contents.xml.sig

# cd ESX350-200806812-BG

# esxupdate update

The patch will install, and you will loose connection from your Virtual Center to the box for about 2 to 5 minutes, after that she will show up again in maintenance mode.

Remove machine from maintenance mode and your good to power up everything again.

Francisco Cardoso, Logica PT - VCP
0 Kudos