The express patches have been posted. This thread is long.
We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.
The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.
Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".
Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:
Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.
Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".
A call to tech support confirmed this as a known problem with a temporary workaround.
Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.
As soon as the date was reset to the 10th - problem solved.
Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.
So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.
There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.
Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!
Message was edited by: JohnTroyer to add new thread links.
@daniel_uk: "who on earth is going to power up their VMs in the morning": How about HA? Remember, both DRS and HA are as of now disabled by all means on update2 platforms.... So major? I would think so.
In the 4-5 years I've been using vmware this is the first major bug I've encountered. Finding vendors that are major bug free, will be a long journey. Microsoft have done it too, several times at that. Don't judge the products on this problem, but see the big picture. Would it be any different on another virtualization platform? At least it's a software problem that can be solved easily compared to any hardware based problems like the one AMD and Intel did.
I'm sorry, but if what you are saying is true and you don't have the ability to conceive of an acceptable workaround based on twiddling the ESX host's clock during the VM's launch window, well that's just sad. I think I'd have them up and running by now, but I guess it's hard to find the time to fix things when you are being so helpful and constructive here.
I hope your client's will be understanding when they find this thread and realize that you could have had them back in operation hours ago with a little gumption.
We only have two 3.5u2 hosts, both not yet live, so we've managed to avoid any production issue here.
With the issue of putting the date back then affecting VM timestamps when they boot up would the following work :
1) unconfigure NTP
2) set time back to 1 Aug
3) Turn on all VMs, but press F12 or Del to go into the guest's BIOS config or Choose Boot Device screen - ie prevent bootup into OS.
4) correct time back to 12 Aug
5) Press Ctrl-Alt-Ins in each VM so they then reboot, pick up correct time and start OS bootup
This is presuming that the license check is only done at VM startup, and not at VM reboot (ie ctrl-alt-ins in VM, not a VM reset).
Look, you can adjust the date for the 15 seconds that it takes to allow the power on cycle to occur.
Write a script that executes the power on command for each host manually if you must, but changng the time for these few seconds will NOT effect the time on the VM or even Virtualcenter.
These logs will be accurate.
I certainly hope that you're not allowing this to be worse than it HAS to be, because it focuses attention on what you do.
I know that sounds mean, but we ALL know engineers that LOVE to alert everyone to critical situations.
Just get the CNC machines online, those make the $$ man, the time stamp in the logs that would be effected are NOT an issue for these few seconds!
Be the guy that made this a non-issue.
...from an engineer that administers a SOX compliance NIGHTMARE VMware farm - we control 60% of all lottery back end servers in the WORLD!
so who on earth if your running VM's in production is going to powering on your VM's in the morning
TODAY IS MSFT PATCH TUESDAY!!! And besides, VDI users can reboot whenever the hell they feel like it!
A large part of my own concern is that this patch installed without much warning, we would never install a patch before it had been road tested for a few weeks by others - when we patched last week we did not intend to install it and the list of patches to remediate did not list U2 (in fact we wondewred if it had been pullked as it suddenly didn't show up). We only realised it had done so when looking at the version number.
Obviously we would not have minded so much if it had worked!!
Along with the frequent crashes of the VI client this has me very glad we are still on 2.5.5 for our production areas and only our dev work is being affected by this.
I can't push for increased VMWare virtualisation when stories like this hit the headlines and our directors start muttering darkly about hyper-v and citrix/xen as alternatives...
I have co-developed a large virtualization solution for my company that manages multiple our labs that supports Virtual Server 2005, Hyper-V and I am now adding support for ESX. I was wrapping up all of my interfaces today when my power on code started to fail. I looked at the log and was dumbfounded by the licencing error... LOL, good week to start implemententing all of our web service API for ESX :).
I am sure they will have this resolved shortly.
You can easily install u1.
Unfortunately too much customers went to u2 on main production servers.
or if link doesn't work: KB 1006716
An issue with ESX/ESXi 3.5 Update 2 causes the product license to expire on August 12, 2008. VMware engineering has isolated the root cause of this issue and will reissue the various upgrade media including the ESX 3.5 Update 2 ISO, ESXi 3.5 Update 2 ISO, ESX 3.5 Update 2 upgrade tar and zip files by noon, PST on August 13. These will be available from the page: . Until then, VMware advises against upgrading to ESX/ESXi 3.5 Update 2.
The Update patch bundles will be released separately later in the week. This KB article will be updated as soon as more information is available, check back frequently for updates and additions.
Patch should be available tomorrow.
Some quick instructions (after testing and confirming)
To check if this is affecting you (which probably is):
Confirming the bug
• Try powering on a vm (hopefully not production)
• If you cannot, log into the ESX host (you should see an Internal error)
• Cat-/var/log/vmware/hostd.log (you should see the below )(this is on my my test server)
FIX: (same workaround- 2 methods)
1) Via the gui configuration/ Time….
• log into ESX console
• confirm current date:
Tue Aug 12 21:56:15 EST 2008
• change date to a few days ago: (anything prior to the 12th)
Fri Aug 8 00:00:00 EST 2008
VM’s power on immediately
• You might see discrepancies if you are relying on graphs in virtual centre
• This only affects ESX hosts. All other VM’s that are syncing with the domain time source (PDC Emulator) should be ok. Unless your VM’s are syncing with the ESX hosts.
• If DC’s are being affected then ensure (hope) you have proper system state backups. (as you should)
• Until VMware come out with a fix you might need to create a cron job to reset time. (hopefully not too long)
• These things happen?
We have not upgraded an of our products to Update 2 yet (we are still running U1). I obviously will not touch my ESX servers until this issue is resolved.
But my question is, can I upgrade my Virtual Center server to 2.5 U2? It seems this bug only affects ESX itself. If I upgrade Virtual Center to Update 2 but leave my hosts at Update 1, I should be ok? They left the Virtual Center Update 2 files on the site for download, so I imagine they are unaffected.
I made a script to start a VM. The short timeswitch shouldn't affect some OS. It works on my ESX servers.
Use it at your own risk: "scriptname vmname"
VM=$(/usr/bin/vmware-cmd -l | grep $1)
date -s "`date --date="1 year ago"`"
echo "starting VM $VM"
/usr/bin/vmware-cmd $VM start