After running 27 hours with reasonable performance, we saw a slowdown again last night. We rebooted the application and still saw some issues. After a bit more investigation, we realized that the issues were related to the kernel timer, so we moved the application to different hardware, and voila, no more timing issues. As I write, the system has been up with reasonable performance for another 14 hours.
The timing issues would explain the viewed threads problem and other similar cache-related problems people have reported.
Given that performance is significantly improved, we are taking a more measured approach to implementing the cluster, so that we can be sure the configuration is bullet-proof before it is deployed. We are currently targeting the end of November for the clustered configuration. Also, we are continuing to work on feature fixes, and after we have seen a few more days of reasonable performance without kernel timing issues, then we will start focusing on getting those fixes rolled out to the site.