Best Practices for Oracle

While Oracle makes a lot of software for a bunch of purposes, the only best practice guidance available today is for Oracle databases. We'll build out pages dedicated to other Oracle products if and when other best practices are available.

Real Time Scheduling for Oracle DBs on Linux

The following nugget came from a database expert in performance engineering. Due to the possibility of this change screwing up your DBs it was only with great reluctance that he even committed this to ink. You're taking your own job in your hands if you try this with your production DBs. But, for those of you with a test/dev Oracle DB and an unrepentant need for speed, give this a try. --drummonds

Traditional Unix/Linux timeshare scheduler policies attempt to provide good interactive response times by favoring a process that has just had an I/O request completed over a running process. This can create havoc on a high transaction-rate database server, but can be avoided by manipulating the scheduling policies for the database processes.

The scheduler policies date back to the 1970s when we would be running an editor session simultaneously with nroff (a text formatting program that often ran for minutes to process a document) on a single-processor system. Allowing the long-running process to execute without preemption would mean unacceptably long response times for interactive applications. So, the scheduler decays the priority of a process as it runs and accumulates CPU time. Furthermore, when a timeshare priority process goes to sleep waiting for I/O completion, it sleeps at a stronger priority than any running process with a timeshare priority. This ensures, for example, that a text editor user will get immediate feedback for every character typed on the keyboard even if the system is busy running CPU hungry tasks.

The problem with this scheme comes to the forefront when we run an application such as Oracle on a modern computer system. In an online transaction processing environment, the database management system threads/processes typically run for a short period of time, in the order of milliseconds or tens of milliseconds, then issue a disk I/O or send a network packet. A typical large system may issue many thousands of disk accesses per second. Every time an I/O completes, the scheduler makes the issuing process (or thread) runable, and at a stronger priority than all running timeshare processes. So, we are guaranteed a preemption unless the system can find an idle CPU. And, the preempted process goes to the end of the run queue for its priority level.

Now, if a preemption occurs, and the running process was holding an important database resource, say, a latch, all other processes that may need that latch, possibly including the preemptor process itself, will go into a spin loop waiting for the latch, and will eventually put themselves to sleep until the process holding the latch runs again and releases the latch. This is very expensive in term of CPU usage. The spinning and the context switches will drive up the CPU utilization without an increase in the system throughput.

An even worse phenomenon can occur in a large system with a very high I/O rate, where the problem may actually exhibit itself as excess CPU idle due to processes putting themselves to sleep waiting for latches and not waking up when the latch is released. Another possible symptom is the thundering herd problem: once the process holding the latch runs and releases it, a large number of processes become runable, and chaos and inefficiency follows.

The simplest solution is to maintain a constant priority for the DBMS processes, even when sleeping. This way, we allow the running process to voluntarily give up the CPU, at which time it has supposedly released all latches. Because of the nature of OLTP workloads, processes will voluntarily give up the CPU after a few, or at most tens of, milliseconds. So avoiding involuntary preemptions will fix the problem.

One way of achieving this is using the real-time priority feature. We are not really interested in running the database processes at a strong priority. We only use real time priorities because of an additional feature that this scheduling policy offers: the priority of a real time process does not change when the process sleeps on I/O. As a result when an I/O completion occurs, a process becomes runable at the same priority as the currently running process, and is put at the end of the run queue. We avoid the involuntary preemption and the associated spinning and sleeping costs.

Here is a sample RHEL4.4 script that accomplishes this. This script will run the Oracle processes at a stronger priority than all timeshare priority processes and could result in starvation of other applications. It should be used only when you are certain that running these processes at a strong priority won't deny resources to other applications.

for DAEMON_PID in `ps -u oracle -f|grep -v grep|grep ora_|awk '{print $2}'`
do
sudo chrt --rr -p 82 --pid $
done
LGWR_PID=`ps -u oracle -f|grep -v grep|grep lgwr|awk '{print $2}'`
sudo chrt --rr -p 83 --pid $
LSNR_PID=`ps -u oracle -f|grep -v grep|grep tnslsnr|awk '{print $2}'`
sudo chrt --rr -p 81 --pid $
for SHADOW_PID in `ps -u oracle -f|grep -v grep|grep -v -E 'ora_|tnslsnr'|awk '{print $2}'`
do
sudo chrt --rr -p 81 --pid $
done

To avoid starving the Oracle daemons, we run them at a stronger priority than the shadow processes, with particular attention paid to the logwriter.