Process-Oriented Performance Monitoring

Too many performance monitoring tools on the market or in the freeware world focus on the system. But if you're asked "My xyz program ran very slow at 10pm last night. What was wrong?", system-wide data are rarely sufficient for you to make a conclusion. Perhaps one out of a hundred times it was due to something completely unrelated to a running process, such as bad memory chips. Most likely, you need to look at the processes at OS level and database sessions inside Oracle that existed at the moment to say "The xyz (or abc) process suddenly used a huge amount of memory", or "The SQL execution plan for your application changed last night". Without this process specific information, all you can say, based on sar output or the data collected by your expensive monitoring tool, is "Something caused severe memory shortage", or "There was a big CPU spike", but you don't know what it was. Fortunately, it only takes a small effort to remedy this. That's what I call process-oriented performance monitoring.

The perfmon.sh script shown below is just one of many ways you implement this philosophy on a Linux or UNIX system. It records OS process performance data with top and Oracle session performance data by querying v$session and v$process (if pre-10g, add v$session_wait). The beauty of combining top and Oralce wait event check into one report is that you can easily cross-check. This is important because it follows a performance analyst's natural troubleshooting procedure. If you're notified about a slow running application, after initial verification, the first thing you as a DBA will do on the database server is probably to run top. If there're indeed some oracle processes using lots of CPU as is often the case, you find the Oracle sessions attached to these processes. The provided perfmon.sh offers you the capability to do after-crisis analysis as if the problem was happening right now. It runs as a shell script probably scheduled as a cron job with at least once per minute interval. (The -s option gives you the flexibility to only run the SQL part if you really want.)

#!/bin/bash
#perfmon.sh: Oracle session and OS process performance monitoring script. Intended to be run at least once per min.
#Example assumes a 4 node RAC where hosts are named like host[1-4] and this script runs on host4.

export ORACLE_SID=ORCL
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db
export PATH=$ORACLE_HOME/bin:/usr/bin:/bin
TIMEOUT=10 #this many seconds allowed to connect to another host of RAC
cd /home/oracle/scripts/perfmon

date "+%Y%m%d%H%M%S" >> perfmon.log

if [[ $1 == "-s" ]]; then #Also do OS process logging
for i in 1 2 3; do
  echo -e "\nhost$i\n" >> perfmon.log
  ssh host$i "COLUMNS=132 top -cb | head -15 | sed 's/  *$//' | grep -v ' top -cb$'" >> perfmon.log &
  p=$!
  (sleep $TIMEOUT; kill -1 $p) &
  k=$!
  wait $p
  exit=$?
  #Normal exits are 0..127, signals are 128+signo
  case $exit in
    129) echo '(timed out)' >&2;;
    *) kill $k;;   #  Kill the killer.
  esac
done
echo -e "\nhost4\n"  >> perfmon.log #This script runs locally on host4.
COLUMNS=132 top -cb | head -15 | sed 's/  *$//' | grep -v ' top -cb$' >> perfmon.log
fi

#Feel free to exclude other unwanted events
sqlplus -S / as sysdba <<EOF >> perfmon.log
set pages 1000 lines 200 feedb 0
col i for 9
col evnt for a30
col usr for a15
col mach for a20
col prog for a20
select a.inst_id i,sid,substr(a.username,1,15) usr,substr(machine,1,20) mach,substr(a.program,1,20) prog,sql_id,prev_sql_id,substr(event,1,30) evnt,p1,p2,p3,spid,pga_alloc_mem pga
from gv\$session a,gv\$process b where a.paddr=b.addr and wait_class!='Idle' and event not like 'PX%' order by 1,3;
exit
EOF

There's really no magic here and many shops are already doing this. I wrote the script for a 4 node RAC whose hosts are named like host[number] and this script runs on host4, saving OS process and Oracle session data to a log file on host4. The sleep $TIMEOUT and kill logic is to prevent hanging ssh processes in case of node hang (code snippet from newsgroup). Because of the message about process killed (Terminated), the cron job is better set up like ... perfmon.sh -s > /dev/null 2>&1 or replace /dev/null with a file for debugging. If you don't run RAC, or don't want to stress the network by ssh (therefore you run each copy of the script on each host locally), remove the pink text. If you like, you can include not just top CPU processes, but all processes whose sessions are captured in the SQL section (i.e. non-idle wait sessions); alternatively, you can include not just non-idle wait sessions, but all sessions whose Oracle server processes show up in top.

top is just an example program to gather OS process performance data. You can use lighter-weight prstat on Solaris, or topas on AIX, or more powerful atop on Linux (e.g. to get process I/O statistics). top is run in batch mode (-b) so it can be logged, and only the top few processes are recorded. -c shows a more complete command, up to 132 characters on one line (COLUMNS=132). Since new version of top has no RPM as of March 2008 (Ref), the syntax top -cb number is not used at this time. Note that Windows would be very different so I don't give an example script here.note1

If your database is 9i or earlier, add v$session_wait to the SQL and join on sid to v$session. You also need to exclude individual idle waits because there's no wait_class column anywhere, e.g. (see "Current session waits" at SQLsFreqUsed.txt):
where not (event like '%message%' or event like 'SQL*Net%' or event like '%timer%' or event like 'jobq%')
There's no hard and fast rule about what event is idle and what is not. For instance, 10g v$event_name puts about half of 'PX%' events into 'Other' wait class. Should they be in 'Idle' class instead? And blindly ignoring all 'SQL*Net%' events may be bad because frequent 'SQL*Net % to client' may indicate a real network problem.

The output of the perfmon.sh script is like
20080327124601
...
host2

top - 12:46:04 up 131 days, 11:49,  1 user,  load average: 1.17, 1.13, 1.25
Tasks: 312 total,   4 running, 308 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.7% us,  3.9% sy,  0.0% ni, 73.5% id,  8.5% wa,  0.1% hi,  0.3% si
Mem:   8293488k total,  7828272k used,   465216k free,   332448k buffers
Swap:  8388576k total,  2264096k used,  6124480k free,  3918592k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28543 oracle    22   0 5233m 141m 129m R   83  1.7   0:00.54 oracletest (LOCAL=NO)
 8272 oracle    15   0 5449m 2.3g 2.1g D   22 29.6 943:24.05 ora_pz99_test
10098 oracle    16   0 5231m 2.2g 2.2g D    7 27.9  28:15.04 oracletest (LOCAL=NO)
28548 root      16   0  4976 2912 2320 S    3  0.0   0:00.01 /bin/sh /etc/init.d/init.cssd fatal
 4760 oracle    15   0 5228m 108m  99m S    3  1.3   0:00.89 oracletest (LOCAL=NO)
 8302 oracle    15   0 5231m 102m  91m S    2  1.3   0:01.75 oracletest (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 8451 oracle    15   0 5231m 227m 216m S    1  2.8   0:07.85 oracletest (LOCAL=NO)

 I        SID USR             MACH                 PROG                 SQL_ID        PREV_SQL_ID   EVNT                                   P1         P2         P3 SPID                PGA
-- ---------- --------------- -------------------- -------------------- ------------- ------------- ------------------------------ ---------- ---------- ---------- ------------ ----------
 2        755 CB              EXAMPLE\APPSERVER001 wwwpx.exe            9ss1vsmq53ucs fqwpnzvugsgs6 gc cr request                         173      10806          1 16967            603049
 2        755 CB              EXAMPLE\APPSERVER001 wwwpx.exe            9ss1vsmq53ucs fqwpnzvugsgs6 gc cr request                         173      10806          1 28543           8188249
 2        626 ABC_USER        EXAMPLE\APPSERVER2   abc.exe              8jt2wr0d0r7tg cq7an5qmrb1bq gc cr request                          44     128581          1 10098           5132225
 2        554 SYS             host1.example.com    oracle@host2.exampl.                             db file scattered read                  3     333341         16 8272          217796545
 2        554 SYS             host1.example.com    oracle@host2.exampl.                             db file scattered read                  3     333341         16 10684            717145
 3        816 LOGISTICS_USER  ECP\CSNETTRANS       wwwpx.exe            2aj1gub45fjs5 f4gkzwsdd78gk db file sequential read                53       6110          1 18446           6074041
 3        816 LOGISTICS_USER  ECP\CSNETTRANS       wwwpx.exe            2aj1gub45fjs5 f4gkzwsdd78gk db file sequential read                53       6110          1 29580           6025561
...

Each snapshot begins with a timestamp. Below the name of the host on which this snapshot is taken are the top 15 rows of top output. I usually pay attention to load average, total tasks, user and sys CPU, and free memory. Now the process part, the main focus of this article. top process list offers two benefits: CPU usage sorting (which could be done inside sqlplus with expensive calculation, i.e. sorting on sampling deltas of 'CPU used by this session' or 'CPU used when call started'), and spotting occasional non-database CPU intensive processes.

For instance, here OS process 28543 has used 83% CPU in the last sampling period. In the SQL sectionnote2 below, you see that it corresponds to one of the two Oracle sessions with SID 755 on instance 2 (a multithreaded application could have more than one Oracle session for one shadow process). This session is waiting on global cache CR request for file 173 and block 10806. What exactly is the session doing? Given the sql_id, check v$sqlstats, or v$sql if 9i, for the SQL (v$sqlstats in 10g is highly recommended in preference to v$sql or v$sqlarea because the query requires no latching and the SQL text stays in it longer). For 10g, if the SQL is gone even from v$sqlstats, you can check to see if it was captured by AWR, specifically dba_hist_sqltext and a few other dba_hist_sql% views.note3

Process memory is harder. Suppose you see low free memory in top and don't see a non-database process using much memory, focus on Oracle session's PGA.note4 Since top runs in batch mode, you would have to launch top another time to show the top memory consumers (pipe top -b to sort on %MEM column), if it's absolutely needed. To better assess process memory usage, it may be better to look inside Oracle unless there's a non-database memory hogger. For instance, the Oracle session for OS process 8272 uses about 207 MB PGA private memory, i.e. 26% of 800 MB pga_aggregate_target on this database. Does the session use a huge PL/SQL table? Unfortunately, some sessions' current or latest sql_id's are not recorded in v$session for some reason. For this particular problem, you can narrow down to the segment or extent based on the wait event parameters, and keep an eye on this process next time. Since it runs as SYS and the process name is pz99, it could be a system management session from a DBA's script or emagent accessing a gv$ view. Since OS pid is known, check to see if Oracle bdump or udump has a trace file for this process. The special Linux process state D is not a good sign, perhaps indicating bad or stressed I/O.

For those enjoying 10g active session history (ASH) and automatic workload repository (AWR), enjoy them! ASH is a wonderful process- as well as system-oriented monitoring system. Once every second, it records performance data in v$session and other dynamic performance views, actually by directly attaching to their base table data structures in shared memory. In spite of the sophistication, however, there're a few reasons you may want to use your own monitoring tool instead of ASH. Additional license cost is a major one. Concern about its overhead may be another. Keep an eye on MMON, M000 and MMNL background processes' metrics (excessive CPU usage? UGA leak?...) Increase _ash_sampling_interval if needed. Some technology companies need absolutely top performance of the database server for their applications by shutting down anything fancy that comes with Oracle. An easy way to achieve this is to set statistics_level to basic. Do not do this unless (1) the application is already well written (all except DW jobs use bind variables, SQLs are reviewed by SQL tuning specialists, etc.), and (2) the whole DBA team is sufficiently technical. The reason is that this non-default, "light-weight", statistics_level not just turns off ASH and various advisories, which you know you want to turn off, but also turns off many other important statistics, and you need to explicitly turn them on. Look at v$statistics_level where activation_level='TYPICAL'. If you set statistics_level to basic, you even lose timed statistics which has to be turned on by timed_statistics. That's easy. How about "Bind Data Capture"? It's probably turned on by setting _cursor_bind_capture_area_size to a non-zero value. And then you notice you lose cache buffers chains and a few other latches' statistics. You need to turn on _ultrafast_latch_statistics. Then _longops_enabled for missing data in v$session_longops, and so on. In short, it's probably not worth it to turn off ASH this way. Instead, set _ash_enable to false. However, bear in mind that ASH and AWR are great technologies. If the DBA team relies on Enterprise Manager, and you have the license, there's even more reason to keep them on. Unless the database is genuinely short of power, you can additionally run my perfmon.sh. This offers lots of unique advantages. The ease of cross-checking top OS processes against Oracle sessions aside, since you can modify the script, you have maximum freedom to decide what you want or don't want; perhaps you want some officially idle waits to be included, or non-idle waits to be excluded, or less (or more) columns to care about in v$session than ASH does, and more detailed info about a particular application, less sampling at night, etc...

In summary, process-oriented performance monitoring is not new because some performance tuning analysts are already doing it. The idea is not specific to post-crisis Oracle performance analysis, but has relevance to any application that contains more or less independent worker threads that are like database sessions. Windows or UNIX system admins need to record top OS processes. Middleware app servers need to record resource usage of individual threads for individual functions, if possible. Without this info, it's difficult to track down what dragged down the overall performance.

__________________
Note1 Windows runs Oracle sessions as OS threads inside oracle.exe process. So the top or top-like program needs to sort on CPU sampling deltas of the threads of the process. But then in order to catch occasional or accidental non-Oracle top CPU processes, the top program also needs to watch at the process level. These two tasks are not easy to accomplish with one sorting output. In any case, if needed, I can write two separate programs to achieve the two goals; there's no program I know of that can write top Windows processes or top threads to the console or into a text file.

Note2 My example perfmon.sh script would not create the SQL section if the SQL didn't find even one single matching row that meets the where condition.

Note3 Sometimes the SQL is not recorded in v$sql* views because it's a SQL operated on a LOB column of a table. In that case, you can see the cryptic "SQL" text like table_4_200_4139_0_0 in v$open_cursor or v$db_object_cache, and of course the library cache fixed table x$kglob (this particular one is a SQL reading (4) LOB of a table whose object_id is 0x4139). I've never figured out why Oracle shows the SQL this way. You can find the "real" SQL by cross-checking other entries in v$open_cursor whose saddr and sid are the same for this one.

Note4 The example is from a RH Linux kernel 2.6.9. Since huge pages is not enabled on this server and lock_sga is default false, top SHR shows shared memory in resident working set only; otherwise it would show the Oracle SGA, which is 5008m in this particular instance, plus a little other shared memory.


To my Computer Page
To my OraNotes Page