I thought it was time to start a blog to describe what we have been doing here at VMware with regard to upgrading from Jive Forums, to Jive Clearspace. I suspect everyone out on the VMTN forums might find regular posts interesting, or at least a little calming.
The VMware web team has been working pretty much round the clock since we put VMTN in read-only mode last Thursday evening. The goal was to convert the existing forum records (well over 500,000) to the new clearspace format, and in the process not lose points, any significant data, and real useability. It has been a tall order, but we all respect and really enjoy the activity and relationships with the community members. So it was important to us to preserve as much as we could, while moving to a platform that would allow everyone to create/upload and share documents.
Possibly we will do some additional posts to give you a sence of the timeline over the long weekend, but I thought I would start at where we are now, and later post a few other enteries of where we have been.
Where we are:
o We have managed to debug some serious issues around points calculations, which was bringing the system to it's knees on Sunday. It turns out, as the big users
logged into the system, we discovered that the points calculation system was not able to handle so many big point folks on all at once. We eventually got a fix and
solved this issue.
o We next ran into a database connection issue, where over time, database connections were not being dropped, this too was fixed.
o We next found a point system bug that was still causing excessive database lookups on points. It was Sunday evening when we got this but fix in, and when we got
it in, performance improved dramatically. We went from 7-20 second page load times, down to 2 second page load times. However, after being in production for over
2 hours, things began to gradually get slower, until the system finally crashed.
o Today, we have been working all day on a memory leak, that we believed was causing the problem. We put a performance script in place that would restart Clearspace when
page loads exceeded 30 seconds. This resulted in a apache restart approximately every 2 hours.
o Tonight (Monday Night) at 8:00pm, we deployed the memory leak fix, which seem to improve performance, and memory remained constant. During this deployment, we were
encouraged due to the average page load hovering around 1 second and as fast as .5 seconds. But, after 58 minutes in production, with an average of 50 users online, and the
team pounding on the system, it crashed. Crash dumps indicate a garbage collection issue, possibly due to an imporper collection model.
o We are done for tonight, we have put an apache restart script in place on the hour, this should keep the site pretty fast (under 4 second page loads consistently) it also will
give people an idea when to save before we restart. At 5 minutes to the hour, save your work, and be ready for the site to reset on the hour, the reset takes 1-2 minutes.
Tomorrow we will be back, in the morning, looking at stack traces, the vendors top engineers, and the founder of the company (Jive Software) is working 14 hours a day to solve these issues. Each day we make progress, tonight the site was very fast for a while, we will keep working on it, and hopefully, will solve the last of our problems this week. If we
fall back to the old forums, we will have to restart the entire process (including the 4 days read only) in a few months when the next deployment window opens. We are hoping to solve these problems this week, so we can all upload documents and do collaboratoin together.
Thanks for all the feedback, and encouragement.. we have gotten some good layout ideas, and have a nice list of upgrades to work on once we solve our pressing issues of system stability and performance. More status tomorrow.
The VMware communities team.
Eric & crew,
Best of luck in resolving the outstanding issues. Here's to hoping you are not forced to revert to the previous setup. After using the new system, albeit sporadically, it would be nice to keep moving forward.
Regards,
Jason