We continue to work to improve VMware Communities performance and ensure consistently acceptable page load times. We are tuning the system in the following key areas:
1. Load Balancer Configuration: Today, we updated the F5 load balancer configuration to direct requests directly to the two clustered Tomcat instances running the communities application, bypassing Apache. This will remove the cause of the Gateway Timeout messages many of you have reported. If one of the nodes in the application cluster is down, the load balancer will now redirect all traffic to the functioning node.
2. Redirecting Google traffic: The performance slowdowns of the last several weeks correlate directly to increased load caused by automated services like Google or RSS readers indexing / accessing the VMware Communities site. We had been aware of this issue since December when we updated the robots.txt file on the servers to disallow crawling by Google and reconfigured Apache to redirect Google crawls to a mirror of the site. This fix appeared to work; however, in the last couple of weeks, we have seen that when the system is under load, Google traffic affects the application anyway. The load balancer configuration changes described in #1 above should help resolve this: instead of having Google redirected from Apache running on the same server as the application, it is now redirected from the load balancer. This should remove the cause of the recent performance slowdowns while ensuring that Google continues to index the VMware Communities content. As I write this, we are still in the process of investigating whether there are other changes required to isolate Google and other automated service traffic, so I will update later then week when I have confirmed this fix.
3. Adding a third node: This will add 50% more processing capacity and should allow the application to handle traffic peaks better. We are taking advantage of this change to review all system configuration settings across the cluster. We will implement the third node in the next two weeks.In addition, we have made two application setting changes to temporarily increase performance:
a. Query caching: Many of you have noticed the "Your message was posted successfully, but there will be a short delay before it is viewable in the thread" message when you post. We turned on query caching, which reduces system load about 20% by not requiring the application to rebuild the thread when a new message is posted. The query cache was originally set to 10 seconds, but we reduced it to 5 seconds, which should reduce how often community participants see the message. Our current thinking is to remove query caching when we stabilize performance.
b. Status Level Calculator: The status level calculator refresh rate has been set to 12 hours, which reduces system load. We will reset the status level calculation refresh rate to a shorter interval when we determine that doing so won’t negatively affect system performance.