Benefit of Large Pages for Enterprise Java Applications

The VMware best practices guide for deploying Java applications on ESX recommends enabling large pages, and the VMware whitepaper on large page performance gives a clear demonstration of the performance benefits of using large pages for Java on ESX. However, when I talk to customers I often find they are not using large pages for their Java applications.

The large page performance whitepaper used SPECjbb2005 as the benchmark workload, and showed a performance improvement of 8 to 10 percent when large pages were enabled. While this is a useful result, SPECjbb2005 is a single node benchmark that doesn’t include any communication with external systems. With the Olio Java application discussed in my recent whitepaper, I was able to investigate the impact of large pages on a multi-tier enterprise Java application. Olio relies on an external database and file-server, and the workload includes significant network traffic. The results give a fresh demonstration of the benefit of large pages. The chart below shows the benefit of enabling large pages for 1, 2, and 4–vCPU VMs, as well as for a fully loaded system with eight 2vCPU VMs, with the Olio workload. The heap sizes used are discussed in my previous blog post.

The chart below shows that enabling large pages gives a 6 to 8 percent increase in peak throughput for Olio. Obtaining this improvement required only a few simple configuration steps, and did not require allocating any additional memory to the VMs.

Enabling large pages is relatively simple, although the exact process varies depending on the operating system and JVM. The large page performance whitepaper includes instructions for enabling large pages on Red Hat Linux and Windows 2003 Server. This blog post also gives a good overview of the steps required to enable large pages for Linux and Java. You should also be able to find the instructions in the documentation for your specific OS. The key to remember is that large pages need to be enabled in both the OS and in the JVM. For the Sun JVM the necessary command-line parameter is -XX:+UseLargePages, for JRocket it is -XXlargePages, and for IBM's JVM it is –lp. Large-page support is enabled in VMware ESX by default. When using these options, you should check the JVM logs carefully for error messages indicating that the JVM was not able to allocate large pages.

The bottom line is that enabling large pages is simple and can have a significant performance benefit. If you are interested in improving the performance of your Java applications, enabling large pages should be your first step.

nicknorthgate · ‎03-02-2011

Hi Hal.

It appears we are both interested in running J2EE servers on ESX. Your white papers make very interesting reading and back up many of my own findings but I believe you have missed a key piece of info in all of them. Failed to list your entire set of JVM options.

We have a J2EE app running on Jboss with an Oracle back end. Our standard deployment is on a big Solaris box or a Wntel box. Our performance tests on the Jboss element have been done with 2 Xeon cores and a heap size of 4 gig.

Typical customer deployments are done by running multiple Jboss instances with 4gb heaps on Xeon servers with 16 cores and 48gb ram, or on Sparc T3's and the application scales nicely.

As you can see we use Parallel GC on the Eden and Tenured generations and guarentee the 3/8th rule.

We then use HP Load runner to load up a front end apache 2.2 server which loadbalances to our Jboss instances.

Our target response time for all functions is sub 5 seconds. On physical tin, the reponse times are fairly regular. eg. a test case may have a min time of 3 seconds and a max of 6 when running 100 users with 30 second think times. The difference I contribute to major GC giving occasional longer responses.

My set of JVM Options on Windows are as follows, and I have set my windows user to lock large pages in group policy:

-Server

-Xmx4096m

-Xms4096m

-XX:SurvivorRatio=6

-XX:MaxPermSize=256m

-Xss256k

-XX:+UseConcMarkSweepGC

-XX:+UseParNewGC

-XX:TargetSurvivorRatio=90

-XX:MaxTenuringThreshold=15

-XX:LargePageSizeInBytes=2m

-XX:+UseLargePages

-XX:NewSize=864m

-XX:MaxNewSize=864m

-Dsun.rmi.dgc.client.gcInterval=3600000

-Dsun.rmi.dgc.server.gcInterval=3600000

Now when I virtualise this I have tried one VM with 8GB and 2 vCPU's and I have tried 2 VM's with 4GB each and 2vCPU's each (Scalling heap down to 2g per vm). Based on your work I expected the 2 vm's to outperform the single VM, and be fairly close to physical hardware. However this was not the case. On both VM options our min tranaction times were broadly similar, but the max were many tens of times greater. e.g. 137 seconds in opposed to 6 seconds!

I have put this down to the fact that our entire heap is very very busy in our app. All that allocation and GC activity is going on whilst ESX is looking for pages in memory that can be shared by VM's. There is no Ballooning going on and we have reserved the full memory and cpu commitment for the VM's. My next course of action is to try and disable all page sharing on the ESX host to remove that too. I get the feeling when running JVM;s on ESX you may not benefit from using highly parallel GC, so I may also go back to single threaded GC on that too.

Would you be at all interested in chatting? nick.bramwell@northgate-is.com

MichaelW007 · ‎03-04-2012

If you're using Windows 2003 by default it won't use the HWMMU or hardware virtualization. You should manually select this for the VM config (assumes you have modern hardware that supports the virtualization features). If you're using Windows 2008 or above then it will automatically select HWMMU and Hardware Virtualization and will use physical large pages to back the VM. You should also check the local security policy as the group policy may not be applied correctly and this may mean it's not locking the pages in memory. There are also verious bugs with different versions of windows that will impact the way memory is used for things like filesystem cache. You will want to limit OS paging as it will impact performance (I'm sure you know this already). vSphere won't be trying to share pages if you have reserved all the memory, the OS won't try swapping out the JVM if the pages are locked in memory correctly. It will normally come down to what else is running on the system. I met up with a colleage of yours Steve (from Glasgow) for a drink at a bar in Wellington, New Zealand. He can get in touch with me offline to go through this in more detail. The guys he was working with and I have a lot of experience getting Java workloads producing excellent (and consistent and predictable) performance when virtualized.

The first thing I would do is simplfy the JVM config and get it back to bare bones. Max and min heap being the same, UseParallelGC (not the new ones) and Use Large Pages. You should also run JConsole to see what the heap is doing during your runs. One thing to keep in mind is that ESX does not break the laws of physics and only produces minor miracles. Halving the amount of memory for a workload that actually needs the memory will not produce good performance. You need to find the sweet spot for your workload and then see if scale up or scale out works best. Many cases you might find 6GB or 8GB RAM with a 4GB Heap produces good results and 2 or 3 vCPU. You need to take into account the NUMA node size on the system you are working with. Make sure the VM's always easily fit into the NUMA node size and then you should be able to scale up within a host fairly linearly. What version of ESX are you running your tests against?