Today we increased the heap size of the application server and then increased cache sizes in the application to increase their effectiveness. We also rolled out a patch to the AuthFactory that fixes a locking problem in the status level calculator and removes the root cause of one class of system slowdowns.
The result of these fixes, along with the switch to JDK 1.5 last Friday, is that we are seeing improved stability and overall lower cpu usage. However, we are still seeing periodic performance slowdowns that we continue to monitor closely.
We are actively working to implement a clustered solution for the application server, which will allow for larger caches and therefore fewer hits against the backend database. Meanwhile, we have decided to hold off for a few days on the feature enhancements we discussed last week, so that we can focus fully on the app server cluster.