VMware Cloud Community
mattjk
Enthusiast
Enthusiast

BIG bug in ESX 3.5 Update 2 - If you're using 3.5u2 read this now! - A general system error occurred: Internal Error

The express patches have been posted. This thread is long.

Please post technical experiences here and non-technical feedback here. --JohnTroyer

Hi all,

We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.

The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.

The bug:

Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".

Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:

Aug 12 10:40:10.792: vmx| This product has expired.

Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.

Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".

A call to tech support confirmed this as a known problem with a temporary workaround.

The work-around:

Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.

As soon as the date was reset to the 10th - problem solved.

Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.

So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.

There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.

Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!

Cheers,

Matt Kilham

Stratton Car Finance

Message was edited by: JohnTroyer to add new thread links.

Cheers, Matt
Reply
0 Kudos
704 Replies
Erik_Zandboer
Expert
Expert

@daniel_uk: "who on earth is going to power up their VMs in the morning": How about HA? Remember, both DRS and HA are as of now disabled by all means on update2 platforms.... So major? I would think so.

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
Anders_Gregerse
Hot Shot
Hot Shot

In the 4-5 years I've been using vmware this is the first major bug I've encountered. Finding vendors that are major bug free, will be a long journey. Microsoft have done it too, several times at that. Don't judge the products on this problem, but see the big picture. Would it be any different on another virtualization platform? At least it's a software problem that can be solved easily compared to any hardware based problems like the one AMD and Intel did.

Reply
0 Kudos
Bakafish_com
Contributor
Contributor

I'm sorry, but if what you are saying is true and you don't have the ability to conceive of an acceptable workaround based on twiddling the ESX host's clock during the VM's launch window, well that's just sad. I think I'd have them up and running by now, but I guess it's hard to find the time to fix things when you are being so helpful and constructive here.

Adding:

I hope your client's will be understanding when they find this thread and realize that you could have had them back in operation hours ago with a little gumption.

Reply
0 Kudos
GMellor
Contributor
Contributor

We only have two 3.5u2 hosts, both not yet live, so we've managed to avoid any production issue here.

With the issue of putting the date back then affecting VM timestamps when they boot up would the following work :

1) unconfigure NTP

2) set time back to 1 Aug

3) Turn on all VMs, but press F12 or Del to go into the guest's BIOS config or Choose Boot Device screen - ie prevent bootup into OS.

4) correct time back to 12 Aug

5) Press Ctrl-Alt-Ins in each VM so they then reboot, pick up correct time and start OS bootup

This is presuming that the license check is only done at VM startup, and not at VM reboot (ie ctrl-alt-ins in VM, not a VM reset).

Gary.

Reply
0 Kudos
LB1
Contributor
Contributor

togate:

Look, you can adjust the date for the 15 seconds that it takes to allow the power on cycle to occur.

Write a script that executes the power on command for each host manually if you must, but changng the time for these few seconds will NOT effect the time on the VM or even Virtualcenter.

These logs will be accurate.

I certainly hope that you're not allowing this to be worse than it HAS to be, because it focuses attention on what you do.

I know that sounds mean, but we ALL know engineers that LOVE to alert everyone to critical situations.

Just get the CNC machines online, those make the $$ man, the time stamp in the logs that would be effected are NOT an issue for these few seconds!

Be the guy that made this a non-issue.

...from an engineer that administers a SOX compliance NIGHTMARE VMware farm - we control 60% of all lottery back end servers in the WORLD!

Reply
0 Kudos
sradnidge
Enthusiast
Enthusiast

so who on earth if your running VM's in production is going to powering on your VM's in the morning

...

TODAY IS MSFT PATCH TUESDAY!!! And besides, VDI users can reboot whenever the hell they feel like it!

Reply
0 Kudos
LudoS
Contributor
Contributor

BakaFish,

You are right, swapping vmware-cmd doesn't provide anything good. Would have been too easy ...

BR

Reply
0 Kudos
ezhosting
Contributor
Contributor

As far as I have tested, rebooting from inside an already powered on machine works!

Reply
0 Kudos
AntonVZhbankov
Immortal
Immortal

vmware-cmd is just a perl script.

EMCCAe, HPE ASE, MCITP: SA+VA, VCP 3/4/5, VMware vExpert XO (14 stars)
VMUG Russia Leader
http://t.me/beerpanda
Reply
0 Kudos
THP
Contributor
Contributor

A large part of my own concern is that this patch installed without much warning, we would never install a patch before it had been road tested for a few weeks by others - when we patched last week we did not intend to install it and the list of patches to remediate did not list U2 (in fact we wondewred if it had been pullked as it suddenly didn't show up). We only realised it had done so when looking at the version number.

Obviously we would not have minded so much if it had worked!!

Along with the frequent crashes of the VI client this has me very glad we are still on 2.5.5 for our production areas and only our dev work is being affected by this.

I can't push for increased VMWare virtualisation when stories like this hit the headlines and our directors start muttering darkly about hyper-v and citrix/xen as alternatives...

Reply
0 Kudos
s1m0nb
Enthusiast
Enthusiast

yes..why do directors do that muttering stuff??

Reply
0 Kudos
jlegan
Contributor
Contributor

I have co-developed a large virtualization solution for my company that manages multiple our labs that supports Virtual Server 2005, Hyper-V and I am now adding support for ESX. I was wrapping up all of my interfaces today when my power on code started to fail. I looked at the log and was dumbfounded by the licencing error... LOL, good week to start implemententing all of our web service API for ESX :).

I am sure they will have this resolved shortly.

Reply
0 Kudos
AntonVZhbankov
Immortal
Immortal

You can easily install u1.

Unfortunately too much customers went to u2 on main production servers.

EMCCAe, HPE ASE, MCITP: SA+VA, VCP 3/4/5, VMware vExpert XO (14 stars)
VMUG Russia Leader
http://t.me/beerpanda
Reply
0 Kudos
jlegan
Contributor
Contributor

This is an ESXi box. Didn't it come out as U2?

Reply
0 Kudos
J-D
Enthusiast
Enthusiast

KB article:

or if link doesn't work: KB 1006716

An issue with ESX/ESXi 3.5 Update 2 causes the product license to expire on August 12, 2008. VMware engineering has isolated the root cause of this issue and will reissue the various upgrade media including the ESX 3.5 Update 2 ISO, ESXi 3.5 Update 2 ISO, ESX 3.5 Update 2 upgrade tar and zip files by noon, PST on August 13. These will be available from the page: . Until then, VMware advises against upgrading to ESX/ESXi 3.5 Update 2.

The Update patch bundles will be released separately later in the week. This KB article will be updated as soon as more information is available, check back frequently for updates and additions.

Patch should be available tomorrow.

Reply
0 Kudos
gdragats
Contributor
Contributor

Some quick instructions (after testing and confirming)

To check if this is affecting you (which probably is):

Confirming the bug

• Try powering on a vm (hopefully not production)

• If you cannot, log into the ESX host (you should see an Internal error)

• Cat-/var/log/vmware/hostd.log (you should see the below )(this is on my my test server)

Failed to do Power Op: Error: Internal error

Failed operation

Event 16 : Failed to power on w2k3_st_sp2_1 on esx1.xx.yy.com in ha-datacenter: A general system error occurred:

State Transition (VM_STATE_POWERING_ON -> VM_STATE_OFF)

Task Completed : haTask-16-vim.VirtualMachine.powerOn-42

FIX: (same workaround- 2 methods)

1) Via the gui configuration/ Time….

2)

• log into ESX console

• confirm current date:

$ date

Tue Aug 12 21:56:15 EST 2008

• change date to a few days ago: (anything prior to the 12th)

# date -s 08/08/08

Fri Aug 8 00:00:00 EST 2008

VM’s power on immediately

NOTES:

• You might see discrepancies if you are relying on graphs in virtual centre

• This only affects ESX hosts. All other VM’s that are syncing with the domain time source (PDC Emulator) should be ok. Unless your VM’s are syncing with the ESX hosts.

• If DC’s are being affected then ensure (hope) you have proper system state backups. (as you should)

• Until VMware come out with a fix you might need to create a cron job to reset time. (hopefully not too long)

• These things happen?

Reply
0 Kudos
davidjerwood
Enthusiast
Enthusiast

Sorry if this has already been answered. What build number upwards does this affect ??

Reply
0 Kudos
APN_NZ
Contributor
Contributor

It effects Build: 103908 VMware ESX 3.5 Update 2 and Build: 103909 VMware installable ESXi 3.5 Update 2

Reply
0 Kudos
shane_presley
Enthusiast
Enthusiast

Hi All,

We have not upgraded an of our products to Update 2 yet (we are still running U1). I obviously will not touch my ESX servers until this issue is resolved.

But my question is, can I upgrade my Virtual Center server to 2.5 U2? It seems this bug only affects ESX itself. If I upgrade Virtual Center to Update 2 but leave my hosts at Update 1, I should be ok? They left the Virtual Center Update 2 files on the site for download, so I imagine they are unaffected.

Thanks

Reply
0 Kudos
dalo
Hot Shot
Hot Shot

I made a script to start a VM. The short timeswitch shouldn't affect some OS. It works on my ESX servers.

Use it at your own risk: "scriptname vmname"

#!/bin/bash

VM=$(/usr/bin/vmware-cmd -l | grep $1)

/etc/init.d/ntpd stop

date -s "`date --date="1 year ago"`"

echo "starting VM $VM"

/usr/bin/vmware-cmd $VM start

ntpdate pool.ntp.org

/etc/init.d/ntpd start

Daniel

Reply
0 Kudos