Did you find any more information on the possible JVM-Network issue that you were investigating? We have many tomcat-based applications running in JVMs and recently, there has been a reported slowdown. They do not push the CPU at all, but they do use back-end JDBC Oracle/MS-SQL DBs. If this is an issue, I want to be able to escalate internally and within VMware.
We do not have the luxury of being able to move the apps to a physical environment. We are using JVM jdk1.5.0_14-b03 (64-bit) on RHEL 4.8 (64-bit) 2vCPU.
Any input would greatly enhance additional troubleshooting that we would perform to validate.
I have now completed my migration to ESX 4.0, and while the performance of some java applications does seem to have improved (notably Lucene) at least one of my users is still reporting tomcat application poor performance. I am trying to arrange a meeting with the user at the moment to narrow down where the performance problem is. The difficulty is that we're dealing with a vast number of layers of code here, any one of which might be having trouble in the virtualised environment:
The user's own java code
Tomcat
JDBC
The JVM
The Oracle client libraries
The OS itself
The virtual network card in the guest
The virtual switch
to name a few. We're all using different distributions, so unless it's a fundamental problem with Linux itself, I think we can discount that. I've also tried several java versions, with no change, so it isn't that, again unless there's a fundamental problem with Java on ESX guests, which I don't believe.
I've asked the user to give me:
a) The SQL query that's slow (so I can use it directly with sqlplus, and see if it's the Oracle client stack or below that's causing it)
b) A tiny CLI java app which uses JDBC to make that same query (which should tell us whether it's JDBC-related, or something in the higher user code)
I've also asked the user to put some instrumentation into their code to time various sections, so they can really tell me where the slowdown is. The application is quite large and complex, and the reports are really just nebulous "it's too slow" reports.
I'll let you know what results I get from the above tests, if the user supplies me with some code.
If any of you have done anything like the above analysis, I'd like to hear results.