The express patches have been posted. This thread is long.
Please post technical experiences here and non-technical feedback here. --JohnTroyer
Hi all,
We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.
The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.
The bug:
Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".
Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:
Aug 12 10:40:10.792: vmx| This product has expired.
Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.
Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".
A call to tech support confirmed this as a known problem with a temporary workaround.
The work-around:
Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.
As soon as the date was reset to the 10th - problem solved.
Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.
So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.
There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.
Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!
Cheers,
Matt Kilham
Message was edited by: JohnTroyer to add new thread links.
"We put our faith in a company like vmware to release stable updates that can be used in multi million dollar bussinesses however today we found out we were wrong"
You make it sound like Princess Diana died.
Sense of proportion??
I thought this would be common sense, but it seems as though it's worth pointing out. Those of us who PAID for VI3 or ESX3.5 Enterprise installs probably have a SLIGHTLY better reason to be upset about this issue than those who have obtained ESXi 3.5 for free. You should get what you pay for. Just saying.
Those of us who PAID for VI3 or ESX3.5 Enterprise installs probably have a SLIGHTLY better reason to be upset about this issue than those who have obtained ESXi 3.5 for free. You should get what you pay for. Just saying.
Both ESX and ESXi are enterprise class hypervisors. Don't let the price tag of ESXi fool you. Expect no less out of ESXi than you would of ESX.
[i]Jason Boche[/i]
[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]
Indeed. Expect no less of ESXi, but certainly expect more from VI3. Although it seems that in this particular instance, the multiple thousands of dollars spent on the right to use the software without having it treat you like a criminal is for naught.
Matt,
Just to let you know that the date problem fixed my issue too, but took all afternoon to discover your fix
or should I say work around till they have a fix for the problem.
We have:
10 ESX servers with Vi3 Enterprise (HA/Vmotion) with 144 live VM's.
Aside from from not catching that the guests were synchronizing their time from the hosts (uhg Microsoft no likey that :-). We have had no issue in production environment.
Just checke out the agenda for VMWorld next month and found a nice breakout-session. I guess the speaker will be confronted with some uncomfortably questions....;)
PO3008 Timekeeping and Time-Sensitive Applications in VMware Virtual Machines: Best Practices
I just find it disturbing that this incident is causing soo much drama. The QA process is still a human process. Besides, how many times have we had to install a patch to fix a previous patch from other software vendors?
If it is a huge thing not to change the host time for a few minutes, then perform the downgrade option. My faith in VMware is not faltering at all, actually re-enforced. The lates release from them in this forum states they will attempt to have an express patch out by 6pm PST today. That's damned quick for a fix of this magnitude. I am very impressed.
Sincerely,
PC & Network Systems Support
Monsanto Company
2500 Wiggens Road
Muscatine, IA 52761
(563) 288-6279
(563) 299-6370 - Cell
Jody.L.Whitlock@Monsanto.com
Curiosity: The hairball of life...
I'm sorry but I have to say the "time keeping" post was kind of funny
You make it sound like Princess Diana died.
She did. But I doubt there would have been 400+ messages on the VMware forums regarding that, had this forum been around back then. This actually impacted some people. I think they do have a sense of proportion how this bug has impacted them.
The question is, will the buzz still linger through VMworld next month. I think there's a lot of questions the industry will be looking for answers to next month. Somehow I don't think we'll all be as giddy as we were going into VMworld 2005. The landscape has changed. Things like this don't help.
Cheers, J
1. 7 Hosts 120 vm's DRS and VMotion
2. 0 - We do not have this update installed
3. Yes
4. We will wait a while till we install this one ....
Maish
Systems Administrator & Virtualization Architect
Likewise. The outlook client on my phone keeps timing out becaus it has to download 50 messages every time it connects. It seems to have slowed down in the last hour though.
From the looks of my impromptu poll, most people had 0 minutes of downtime. I chalk that up to advanced planning, and perhaps a bit thinking quick on ones feet.
Philip Arnason
Amen brother Philip.
This bug would have been VERY easy to detect if they were running Time Machine as part of there regression testing. Time Machine may even help those that are hitting this error by allowing the virtual servers to run in real-time while the VMware license process is seeing a date in the past.
Disclaimer: Yes, I work for Solution-Soft. No, we have not tried it against this issue, however we do have many users running Time Machine in a VWware environment without issue. We offer free one week trials if you want one..it may get you past this issue until the fix from VMware comes out..
...From the official VMware NTP powerpoint... Also, what has been said about VMs retrieving their time from the ESX server on cold boot is absolutely true. Had a few servers off time last week, server admins were seeing very strange timestamps on their logs before the logon NTP sync occurred.
Drama? my friend Tibmeister call this issue drama...allow me to laugh...obviously you don't have your manager asking you every 1hour for an update on this issue. This is a major setback and a punch in the eye for VMware, now that Citrix and Microsoft are pushing for this market like never before...changing the time on the servers is SIMPLY UNACCEPTABLE this is not a workaround solution.
Systems Integration Engineer
Rogers Wireless Inc
Montreal, Canada
And kudos all round to most of us then - for proper planning... (or shear luck -if you would like)
Maish
Systems Administrator & Virtualization Architect
I just find it disturbing that this incident is causing soo much drama. The QA process is still a human process. Besides, how many times have we had to install a patch to fix a previous patch from other software vendors?
If it is a huge thing not to change the host time for a few minutes, then perform the downgrade option. My faith in VMware is not faltering at all, actually re-enforced. The lates release from them in this forum states they will attempt to have an express patch out by 6pm PST today. That's damned quick for a fix of this magnitude. I am very impressed.
All I have to say is:
Jase