SQL Tuning Cases

Basic ideas

* Use select dbms_sqltune.report_sql_monitor('SQL ID') from dual to find the lines in the SQL execution plan with the highest number(s) in the "Activity(%)" column. (If the SQL is no longer running, add type=>'html' or type=>'active' to the function, save the output to an HTML file and view it in a web browser.)
* Use select * from table(dbms_xplan.display_cursor('SQL ID','','')) to find the predicate information corresponding to those lines.
* Tune the operations on these lines.
* Check SQL stats in v$sql before and after tuning.

Case 1: Missing index

The per-execution runtime of the SQL 5z2vwxj7t4a3t is 467 or 1778 seconds depending on which node of RAC it is run on:

SQL> select inst_id, buffer_gets, executions, buffer_gets/executions, elapsed_time/executions, cpu_time/executions, last_active_time, plan_hash_value from gv$sql where sql_id='5z2vwxj7t4a3t' order by 4;

   INST_ID BUFFER_GETS EXECUTIONS BUFFER_GETS/EXECUTIONS ELAPSED_TIME/EXECUTIONS CPU_TIME/EXECUTIONS LAST_ACTIVE_TIME  PLAN_HASH_VALUE
---------- ----------- ---------- ---------------------- ----------------------- ------------------- ----------------- ---------------
         4   452001763        120             3766681.36               467657133           460314377 20220504 16:52:48      1972090739
         2  3404972002         12              283747667              1778348869          1748656389 20220504 16:52:49      1972090739

There's only one execution plan, which is:

SQL> select * from table(dbms_xplan.display_cursor('5z2vwxj7t4a3t','',''));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID  5z2vwxj7t4a3t, child number 0
-------------------------------------
Select count (*) from (select * from (select PK_DISPENSE as id,
...
DISP.FK_DISPENSE)) as DRUGNAME, (select
...

Plan hash value: 1972090739

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
...
|* 80 |       TABLE ACCESS FULL             | NOST_DISPUNIT     |     1 |    11 |  1391   (1)| 00:00:01 |
|  81 |      SORT GROUP BY                  |                   |     1 |    12 |            |          |
|* 82 |       TABLE ACCESS FULL             | NOST_DISPUNIT     |     1 |    12 |  1391   (1)| 00:00:01 |
...
Predicate Information (identified by operation id):
---------------------------------------------------
...
  80 - filter(("FK_DISPENSE"=:B1 AND "DELETEDFLAG"=:SYS_B_039))
  82 - filter(("DU"."FK_DISPENSE"=:B1 AND "DELETEDFLAG"=:SYS_B_040))
...

According to

SQL> select dbms_sqltune.report_sql_monitor('5z2vwxj7t4a3t') from dual;
...
SQL Plan Monitoring Details (Plan Hash Value=1972090739)
================================================================================================================================================================================
| Id  |               Operation               |       Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|     |                                       |                   | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
================================================================================================================================================================================
...
|  80 |        TABLE ACCESS FULL              | NOST_DISPUNIT     |       1 | 1391 |        15 |     +1 |   210 |      309 |   11 | 90112 |     . |    40.00 | Cpu (6)         |
|  81 |       SORT GROUP BY                   |                   |       1 |      |        10 |     +6 |   210 |      210 |      |       |  2048 |          |                 |
|  82 |        TABLE ACCESS FULL              | NOST_DISPUNIT     |       1 | 1391 |        14 |     +2 |   210 |      309 |      |       |     . |    53.33 | Cpu (8)         |
...

the full table scan on table NOST_DISPUNIT consumes most of the time (40% and 53%). The where-conditions for lines 80 and 82 in the execution plan map to filter operations on columns FK_DISPENSE and DELETEDFLAG (see "Predicate Information" in the output of dbms_xplan.display_cursor).

Column FK_DISPENSE is highly distinct (but DELETEDFLAG is not):

SQL> select count(*), count(distinct fk_dispense), count(distinct deletedflag) from nost.NOST_DISPUNIT;

  COUNT(*) COUNT(DISTINCTFK_DISPENSE) COUNT(DISTINCTDELETEDFLAG)
---------- -------------------------- --------------------------
    551899                     439837                          1

But there's no index on column FK_DISPENSE:

SQL> select * from dba_ind_columns where column_name='FK_DISPENSE';

no rows selected

Solution: Create an index on FK_DISPENSE column.

Result:

SQL> select inst_id, buffer_gets, executions, buffer_gets/executions, elapsed_time/executions, cpu_time/executions, last_active_time, plan_hash_value from gv$sql where sql_id='5z2vwxj7t4a3t' order by 7;

   INST_ID BUFFER_GETS EXECUTIONS BUFFER_GETS/EXECUTIONS ELAPSED_TIME/EXECUTIONS CPU_TIME/EXECUTIONS LAST_ACTIVE_TIME  PLAN_HASH_VALUE
---------- ----------- ---------- ---------------------- ----------------------- ------------------- ----------------- ---------------
         3   567363199         68             8343576.46              53355816.3          52664979.5 20220623 11:24:24      1972090739
         4     1158065         17             68121.4706              1497751.59          609838.647 20220630 09:03:06      1078216129
         1     1282768         20                64138.4               886520.95            562954.5 20220630 09:13:53      1078216129
         3       62375          1                  62375                  799404              653536 20220630 09:13:53      1078216129
         2       62369          1                  62369                  844755              777046 20220630 09:13:53      1078216129

With the new execution plan 1078216129, the SQL per-execution buffer gets is now consistently 60K+, while the old plan 1972090739 incurs buffer gets ranging from 8M to 283M (see the query result at the beginning of this section). On average, it's a 1000 times less now than before.
Elapsed time depends on caching effect so it's a less reliable indicator when comparing SQL performance, but that's what the users care about. We can see the per-exec elapsed time is now 0.8 to 1.5 seconds (1.5 seconds is the first execution, with no data cached yet), vs. 53 to 1778 seconds when the plan was 1972090739.
If the SQL is significantly more complicated, it's not easy to determine if a new index will help. Consider using SQL Tuning Advisor (see Appendix 1).

Case 2: Untunable SQL

To get a general idea of the SQL: select * from v$sqlarea where sql_id='54sk8r7sf00f7';

To get some essential stats:

SQL> select inst_id, buffer_gets, executions, buffer_gets/executions, elapsed_time/executions, cpu_time/executions, last_active_time, plan_hash_value, child_number from gv$sql where sql_id='54sk8r7sf00f7' and executions>20000 order by 4;

   INST_ID BUFFER_GETS EXECUTIONS BUFFER_GETS/EXECUTIONS ELAPSED_TIME/EXECUTIONS CPU_TIME/EXECUTIONS LAST_ACTIVE_TIME  PLAN_HASH_VALUE CHILD_NUMBER
---------- ----------- ---------- ---------------------- ----------------------- ------------------- ----------------- --------------- ------------
         1    26142170      45199             578.379389              1942218.03          1890965.92 20220512 09:40:45      2843578395            5
         1  1048319052      71111             14742.0097              2010612.23             1951917 20220512 09:40:37      1580051917            3
         1  1430888716      44550             32118.7142              845991.329           833719.38 20220512 09:42:54      2924532390          101
         3  2192487379      22597             97025.5954              1952574.82          1909005.07 20220428 08:11:33      2049543360           18
         1  2320190490      20949             110754.236              2094110.59          2023285.01 20220512 09:43:13      2843578395            7
         3  3682548284      26195             140582.107              1913811.14          1878168.45 20220428 08:13:38      2049543360           12

Let's focus on the 2nd line, with the highest executions, 71111. So on instance 1:

SQL> set pages 500 long 200000000 longc 230
SQL> select dbms_sqltune.report_sql_monitor('54sk8r7sf00f7',sql_plan_hash_value=>1580051917) from dual; <-- in case of multiple plans, better not omit sql_plan_hash_value or there would be too much output
...
SQL Plan Monitoring Details (Plan Hash Value=1580051917)
==============================================================================================================================================================================================================================
| Id  |                         Operation                          |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity |       Activity Detail       |
|     |                                                            |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |         (# samples)         |
==============================================================================================================================================================================================================================
...
|  60 |                   TABLE ACCESS BY INDEX ROWID BATCHED      | DAILYPACSCAN                  |       4 |  2156 |         4 |     +1 |   195 |     1106 |      |       |     . |    40.00 | Cpu (2)         |
...
|  95 |                       TABLE ACCESS FULL                    | DAILYTPDATA                   |      2M | 10505 |         3 |     +3 |     1 |       3M | 1648 |  14MB |     . |    60.00 | Cpu (3)         |
...

The above shows 60% activity on plan line 95, which is a filter operation on column TIMEPOINTID of table DAILYTPDATA, and 40% on line 60, operation on ISPREASSIGNROOMOCCUPIED of DAILYPACSCAN, according to the following:

SQL> select * from table(dbms_xplan.display_cursor('54sk8r7sf00f7',3,'')); <-- 3 is child number 3
...
------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                                | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------------------------
...
|* 60 |                  TABLE ACCESS BY INDEX ROWID BATCHED     | DAILYPACSCAN                  |     4 |    52 |  2156   (0)| 00:00:01 |
...
|* 95 |                      TABLE ACCESS FULL                   | DAILYTPDATA                   |  2276K|    36M| 10505   (1)| 00:00:01 |
...
  60 - filter("DPS"."ISPREASSIGNROOMOCCUPIED"=1)
...
  95 - filter(("TP"."TIMEPOINTID"=3 OR "TP"."TIMEPOINTID"=35))
...

Check column uniqueness:

SQL> select count(*), count(distinct TIMEPOINTID) from ATCORS.DAILYTPDATA;

  COUNT(*) COUNT(DISTINCTTIMEPOINTID)
---------- --------------------------
   7642114                         10 <-- too bad, column TIMEPOINTID of DAILYTPDATA is very unselective

SQL> select num_rows, last_analyzed from dba_tables where table_name='DAILYTPDATA';

  NUM_ROWS LAST_ANALYZED
---------- -----------------
   7597288 20220413 22:07:40 <-- num_rows or table stats is accurate enough, and the table is recently analyzed

SQL> desc ATCORS.DAILYTPDATA <-- see if there's LOB column
 Name                                Null?    Type
 ----------------------------------- -------- ----------
 DAILYTPDATAID                       NOT NULL NUMBER
 TREATMENTLOCATIONID                          NUMBER
 MRN                                          NUMBER
 TIMEPOINTID                                  NUMBER
 COLLECTIONTIME                               DATE
 DAILYQUEID                                   NUMBER
 PRECOLLECTIONTIME                            DATE
 PRESCHEDULEID                                NUMBER

SQL> select bytes/1048576 mb from dba_segments where segment_name='DAILYTPDATA';

        MB
----------
       309 <-- without LOB column, this is the size of the table

SQL> select count(*), count(distinct ISPREASSIGNROOMOCCUPIED) from ATCORS.DAILYPACSCAN;

  COUNT(*) COUNT(DISTINCTISPREASSIGNROOMOCCUPIED)
---------- --------------------------------------
   1005725                                      2 <-- too bad, this column is also not unique at all

Conclusion: There's no easy way to tune this SQL. The best option is to reconsider the business need and/or modify the logic. Or consider running the SQL with a certain parallel degree. But due to high execution rate and moderate buffer gets (and elapsed time), this is not a good option.
See also Appendix 1 for SQL tuning advisor.

Case 3: Wrong data type passed to where-clause

Problem: Why is this SQL slow in Dev but fast in Prod?

This is a fairly unusual case. The execution plan in Dev and that in Prod are quite similar, except that an index is used in Prod but not in Dev.

In Prod, the SQL 7pd5fbfm9f8tj i.e.

  SELECT  PEP_PATIENT_EPISODE.PEP_ID ,
...
WHERE   ( PCL_PATIENT_CLASS.PCL_CODE (+) = PEP_PATIENT_EPISODE.PEP_PCL_CODE) and ( STF_STAFF.STF_UID (+) = PEP_PATIENT_EPISODE.PEP_STF_UID_PHY_ADMIT) and
...
        ( PEP_PATIENT_EPISODE.PEP_PMI_ID = :vs_pmi_id) and
        ( PEO_PEP_EPISODE_OWNER.PEO_USR_UID (+) = :vl_usr_uid )
...

is passing a char(32) datatype value to the bind variable :vs_pmi_id, according to

SQL> select last_captured, value_string, datatype_string from v$sql_bind_capture where sql_id='7pd5fbfm9f8tj' and name=':VS_PMI_ID' order by 1;

LAST_CAPTURED     VALUE_STRING   DATATYPE_STRING
----------------- -------------- ---------------
20220308 14:03:50 2613305        CHAR(32)

But in Dev, the data type is nchar(32):

SQL> select last_captured, value_string, datatype_string from v$sql_bind_capture where sql_id='7pd5fbfm9f8tj' and name=':VS_PMI_ID' order by 1;

LAST_CAPTURED       VALUE_STRING  DATATYPE_STRING
------------------- ------------- ---------------
2022-03-08:09:16:04 2490168       NCHAR(32) <-- Run by the app
2022-03-08:09:17:42 2548666       NCHAR(32) <--   same
2022-03-08:13:42:56 2548666       NCHAR(32) <-- Run by me in sqlplus; nchar(32) was manually specified by me 
2022-03-08:13:48:04 2548666       NCHAR(32) <--   because I blindly thought the app was right
...

Column PEP_PMI_ID of table PEP_PATIENT_EPISODE is of data type VARCHAR2(14). So passing a char or varchar2 data value is correct, while passing nchar or nvarchar2 will disable the index created on this column, namely PEP_INDEX_01. That's why we see the execution plan in Prod use this index while we don't see it in Dev.

To prove the theory, I changed my sqlplus code to the following; the only difference is on the first line (earlier the line was "var VS_PMI_ID nchar(32)"):

var VS_PMI_ID char(32)
exec :VS_PMI_ID:=2613305
var VL_USR_UID number
exec :VL_USR_UID:=3029
alter session set current_schema=m2prod;
set timing on autot traceonly
SELECT  PEP_PATIENT_EPISODE.PEP_ID,
...
          (PEP_PATIENT_EPISODE.PEP_PMI_ID = :vs_pmi_id) and
          ( PEO_PEP_EPISODE_OWNER.PEO_USR_UID (+) = :vl_usr_uid )
...

and the SQL ran in subsecond in Dev just like in Prod, instead of 1 minute.

Solution: Check the code on the app server. Make sure the bind variable VS_PMI_ID is defined as char (or varchar2), not nchar. If you have difficulty checking it, make sure the Dev app server is using the same code as Prod. If this is a vendor app, engage the vendor.

Result: The app team found a difference in app server setup. After making the Dev config consistent with Prod, the performance dramatically improved.

Case 4: Excessive scan of an index

SQL> set long 200000000 longc 230 pages 1000
SQL> select dbms_sqltune.report_sql_monitor('fjmb2a9nrz2m9') from dual;
...
SQL Text
------------------------------
SELECT CATEGORY_ID FROM PS_REQ_ITM_CAT_VW WHERE SETID = :1 AND ITM_SETID = :2 AND INV_ITEM_ID= :3 AND REQUESTOR_ID = :4
...
Global Stats
====================================================
| Elapsed |   Cpu   | Cluster  |  Other   | Buffer |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  |
====================================================
|    3354 |    3344 |     1.24 |     8.07 |   355M |
====================================================

SQL Plan Monitoring Details (Plan Hash Value=151754481)
=================================================================================================================================================================================
| Id    |                    Operation                    |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity |      Activity Detail       |
|       |                                                 |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |        (# samples)         |
=================================================================================================================================================================================
...
| -> 19 |             INDEX RANGE SCAN                    | PS_PSTREENODE      |     847 |  546 |      3347 |     +8 |     1 |     6342 |     0.03 | Cpu (1)                    |
| -> 20 |              SORT AGGREGATE                     |                    |       1 |      |      3347 |     +8 |  141K |     141K |     0.06 | Cpu (2)                    |
| -> 21 |               INDEX RANGE SCAN                  | PS_PSTREENODE      |       1 |    5 |      3355 |     +0 |  141K |       5M |    99.91 | gc current block 2-way (2) |
|       |                                                 |                    |         |      |           |        |       |          |          | Cpu (3344)                 |
|    22 |           INDEX UNIQUE SCAN                     | PS_REQUESTOR_CAT   |       1 |    1 |           |        |  6342 |          |          |                            |
...
SQL> select * from table(dbms_xplan.display_cursor('fjmb2a9nrz2m9','',''));
...
Plan hash value: 151754481

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
...
|* 19 |            INDEX RANGE SCAN                 | PS_PSTREENODE      |     1 |    38 |     5   (0)| 00:00:01 |
|* 20 |        INDEX UNIQUE SCAN                    | PS_REQUESTOR_CAT   |     1 |    27 |     1   (0)| 00:00:01 |
|* 21 |       INDEX RANGE SCAN                      | PS_PSTREELEAF      |     1 |       |     1   (0)| 00:00:01 |
|* 22 |      TABLE ACCESS BY INDEX ROWID            | PSTREELEAF         |     1 |    57 |     1   (0)| 00:00:01 |
...
Predicate Information (identified by operation id):
---------------------------------------------------
...
  19 - access("A1"."SETID"=:B1 AND "A1"."SETCNTRLVALUE"=:B2 AND "A1"."TREE_NAME"=:B3 AND
              "A1"."TREE_NODE_NUM"=:B4 AND "A1"."EFFDT"<=TO_DATE(TO_CHAR(SYSDATE@!,'YYYY-MM-DD'),'YYYY-MM-DD'))
       filter("A1"."TREE_NODE_NUM"=:B1)
  20 - access("F"."SETID"=:1 AND "F"."REQUESTOR_ID"=:4 AND "F"."CATALOG_ID"="A"."TREE_NAME")
  21 - access("B"."SETID"=:1 AND "A"."SETCNTRLVALUE"="B"."SETCNTRLVALUE" AND
              "A"."TREE_NAME"="B"."TREE_NAME" AND "A"."EFFDT"="B"."EFFDT" AND "A"."TREE_NODE_NUM"="B"."TREE_NODE_NUM")
  22 - filter((("B"."DYNAMIC_RANGE"='Y' AND "E"."SETID"="A"."SETID" AND
              "E"."CATEGORY_CD"="A"."TREE_NODE") OR ("B"."SETID"="C"."SETID" AND "B"."RANGE_FROM"<="C"."INV_ITEM_ID"
              AND "C"."INV_ITEM_ID"<="B"."RANGE_TO")))
...

Note
-----
   - this is an adaptive plan

Analysis: Line 21 of the dbms_sqltune.report_sql_monitor output shows about 100% activity on index range scan of PS_PSTREENODE index. Apparently this index is scanned too many times. Analyzing the columns of the index (Predicate Information for line 19 of the dbms_xplan.display_cursor output) shows that the columns are not really unique and those used by the SQL are particularly non-unique.

Solution: Disable using this index. Either add /*+ no_index(a1 PS_PSTREENODE) */ to the definition of view PS_REQ_ITM_CAT_VW, or make the index invisible: alter index PS_PSTREENODE invisible. (Be aware that making an index invisible could potentially cause problems to other SQLs. Since Oracle has no session-specific invisible index, if you must go this route, you could make this index invisible, alter system set optimizer_use_invisible_indexes=true, and in this session only, alter session set optimizer_use_invisible_indexes=false.)

Result: The very long running SQL can return the result immediately.

Other notes: Why does dbms_sqltune.report_sql_monitor show a different plan than dbms_xplan.display_cursor does, even though they both have the same plan hash value? And note the line numbers differ between the output of dbms_sqltune.report_sql_monitor and that of dbms_xplan.display_cursor for the same steps in the plan. According to the analyst working on SR 3-30740236981, the SQL has adjusted its plan in real-time (note "this is an adaptive plan" and "STATISTICS COLLECTOR"). To quote him, "based on the number of rows returned from a row source it dynamically changes the plan. So it won't know until the middle of execution which one it is going to choose. In this case, it is one hash value but it has the mechanism to change NESTEDLOOPS vs HASH JOIN dynamically." To disable adaptive plan, parameter optimizer_adaptive_plans can be set to false.

Case 5: Unwanted predicate pushing into view and comparative approach

SQL> select dbms_sqltune.report_sql_monitor('bw7pwdp648q3t') from dual;
...
====================================================================================================================================================================================================================================
| Id    |                     Operation                     |        Name        |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp | Activity |     Activity Detail       |
|       |                                                   |                    | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |       |      |   (%)    |       (# samples)         |
====================================================================================================================================================================================================================================
...
|    19 |             VIEW PUSHED PREDICATE                 | PS_PYMNT_XREF_VW   |       1 |    11 |       959 |     +7 |    98 |       82 |       |       |       |       |     . |    . |     0.53 | Cpu (5)                   |

This SQL is extremely slow in prod DB, but fast in a non-prod DB. The most significant difference is that in non-prod, this line is "VIEW", not "VIEW PUSHED PREDICATE".

Solution: Try achieving the plan in non-prod, by disabling predicate pushing with either alter session set "_push_join_predicate"=false (alter system ... to change it database-wide would risk causing problems to other SQLs), or a hint /*+ no_push_pred(D) */ or /*+ no_push_pred(C) */ or a hint /*+ opt_param('_push_join_predicate','false') */ for the SQL. (This view appears twice in the SQL with aliases D and C, respectively. If you use the no_push_pred hint, you need to see which occurrence has the predicate pushed. Or just try each and see which one disables the pushed predicate.)

Note: If the plan showed UNION ALL PUSHED PREDICATE instead, the parameter to disable this feature would be _push_join_union_view.

General comment: This tuning is achieved by comparing two databases instead of working on this one alone. Sometimes this comparative approach saves time because it's easier to know whether a particular step is the cause of slowness. If in this case we didn't have the good database to compare with, we would not focus on this line, because this line has an Activity of only 0.53%.

Case 6: Temporary table has wrong stat

SQL> select dbms_sqltune.report_sql_monitor('gmycdxvxvntq2') from dual;
...
SQL Text
------------------------------
INSERT INTO PS_GM_PT_TRE_TA04 ... SELECT ...
...
=====================================================================================================================================================================================================================================
| Id    |                    Operation                    |       Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Mem  | Temp | Activity |         Activity Detail          |
|       |                                                 |                   | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |       |      |   (%)    |           (# samples)            |
=====================================================================================================================================================================================================================================
...
|     3 |     HASH JOIN SEMI                              |                   |       1 |   28 |     20481 |     +2 |     1 |        0 |       |       |  797K |  91GB | 894MB | 97GB |    98.37 | resmgr:internal state change (1) |
|       |                                                 |                   |         |      |           |        |       |          |       |       |       |       |       |      |          | Cpu (19511)                      |
...
|  -> 8 |          TABLE ACCESS BY INDEX ROWID BATCHED    | PS_GM_PT_PRJ_TA04 |       1 |      |     20481 |     +6 |     1 |       6M |  117K | 915MB |       |       |     . |    . |     0.52 | Cpu (12)                         |
|       |                                                 |                   |         |      |           |        |       |          |       |       |       |       |       |      |          | db file sequential read (94)     |
|  -> 9 |           INDEX RANGE SCAN                      | PS_GM_PT_PRJ_TA04 |       1 |      |     20481 |     +6 |     1 |       6M | 43288 | 338MB |       |       |     . |    . |     0.12 | Cpu (6)                          |

There's not much we can do to the expensive HASH JOIN SEMI itself; we can increase PGA to reduce writing to temp but it won't solve the problem. On inspecting all the tables and indexes given in the NAME column, it's found that table PS_GM_PT_PRJ_TA04 has 29 million rows but dba_tables.num_rows reports 0. This is a classical example of wrong stats leading to a bad plan. On further research, it's found that in this PeopleSoft Financial database, all tables owned by SYSADM have table stats gathered, while Oracle documents such as
E-ORA Methods to Gather Statistics on PeopleSoft Oracle Databases ( Doc ID 1345124.1 )
pscbo_stats - Improving Statistics in Oracle RDBMS for PeopleSoft Enterprise (Doc ID 1322888.1)
https://docs.oracle.com/cd/E92519_02/pt856pbr3/eng/pt/tadm/task_ImplementingPSCBO_STATSForPeopleSoftOptimization.html
clearly state that such temporary tables (i.e. tables with volatile data, not RDBMS global temporary tables) should NOT have stats gathered so that dynamic sampling will be triggered at SQL parsing time.

Solution: Unless it's guaranteed every big data change to these tables will be immediately followed by gather_table_stats, do not gather stats on such tables. If there are already stats, delete them.

Result: After either gathering stats when data loading is done or after deleting stats, the query completes in a few minutes instead of hanging forever.

Case 7: Full table scan preferred on an over-indexed table

SQL> select dbms_sqltune.report_sql_monitor('azxz2rt0sn4bf') from dual;

DBMS_SQLTUNE.REPORT_SQL_MONITOR('AZXZ2RT0SN4BF')
---------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
SELECT SUBSTR ( A.CAP_NUM, INSTR ( A.CAP_NUM, '_') + 1, LENGTH ( A.CAP_NUM) ),
...FROM ((PS_CAP_DET A LEFT OUTER JOIN PS_PO_LINE_DISTRIB B ON A.BUSINESS_UNIT = B.BUSINESS_UNIT AND A.
CAP_NUM = B.CAP_NUM AND A.CAP_SEQUENCE = B.CAP_SEQUENCE ) LEFT OUTER JOIN PS_PO_HDR C ON...
...
Global Stats
============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
============================================================================================
|    9519 |    9471 |     0.05 |        0.01 |       23 |       25 |     2G |   57 | 456KB |
============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2024649290)
=====================================================================================================================================================================================================
| Id    |                 Operation                  |        Name        |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity |       Activity Detail       |
|       |                                            |                    | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |       |   (%)    |         (# samples)         |
=====================================================================================================================================================================================================
...
| -> 25 |         INDEX RANGE SCAN                   | PSQPO_LINE_DISTRIB |       5 |     2 |      9523 |     +2 |    2G |       4G |   35 | 280KB |     . |    31.98 | gc current block 2-way (19) |
|       |                                            |                    |         |       |           |        |       |          |      |       |       |          | Cpu (3020)                  |
| -> 26 |        TABLE ACCESS BY INDEX ROWID         | PS_PO_LINE_DISTRIB |       1 |     3 |      9522 |     +3 |    4G |     4545 |   22 | 176KB |     . |    60.26 | Cpu (5727)                  |
...
SQL> select * from table(dbms_xplan.display_cursor('azxz2rt0sn4bf','',''));
...
|* 23 |       INDEX RANGE SCAN                 | PSQPO_LINE_DISTRIB |     5 |       |     2   (0)| 00:00:01 |
|* 24 |      TABLE ACCESS BY INDEX ROWID       | PS_PO_LINE_DISTRIB |     1 |    52 |     3   (0)| 00:00:01 |
...
  23 - access("B"."PO_ID"="C"."PO_ID")
  24 - filter(("A"."CAP_NUM"="B"."CAP_NUM" AND "A"."CAP_SEQUENCE"="B"."CAP_SEQUENCE" AND
              "B"."BUSINESS_UNIT"="C"."BUSINESS_UNIT" AND "A"."BUSINESS_UNIT"="B"."BUSINESS_UNIT"))
...
   - this is an adaptive plan

According to dba_ind_columns, index PSQPO_LINE_DISTRIB is on 8 columns: PO_ID, LINE_NBR, SCHED_NBR, DISTRIB_LINE_NUM etc. Even though the index is moderately efficient on column PO_ID (2.1 million distinct values in this 9.7 million row table), there are too many Execs (2G) on scanning this index. If we add the /*+ no_index(B PSQPO_LINE_DISTRIB) */ hint, the plan changes to another index on this table, not very efficient either. Disabling that causes the plan to choose yet another index. In fact, there're 19 indexes on this table!

Solution: Tried adding /*+ full(B) */ hint, and the SQL returned the result in a few seconds instead of hanging forever.

Comment: Tried running SQL tuning advisor on the original SQL. Unfortunately, plan information was no longer in shared pool and error ORA-13780 was returned (see the end of Appendix 1 below). But I was able to reproduce the execution plan in sqlplus, and run the advisor against the sqlplus SQL. I was advised to create an index on SYSADM.PS_PO_LINE_DISTRIB("BUSINESS_UNIT","CAP_NUM","CAP_SEQUENCE"). Tried that in a test database and it indeed worked. Nevertheless, forcing a full table scan is a better solution than creating this index because it stops the "proliferation" of index creation on this table. I wish the tuning advisor were smart enough to make a FTS suggestion.

Case 8: Parallel to speed up a non-tunable SQL

A SQL took 3685 seconds or 61.4 min. The SQL text is

INSERT INTO PS_PROJ_RES_TA1 (PROCESS_INSTANCE,...
SELECT 15631038,...
FROM PS_PROJ_RESOURCE SRC
WHERE (SRC.CST_DISTRIB_STATUS = 'N' OR SRC.BI_DISTRIB_STATUS IN ('N', 'U')) AND SRC.SYSTEM_SOURCE NOT IN ('PRC', 'PRP', 'PRR')
AND SRC.BUSINESS_UNIT = 'MDGRN' AND SRC.ACCOUNTING_DT BETWEEN TO_DATE('2012-09-01','YYYY-MM-DD') AND TO_DATE('2023-08-31','YYYY-MM-DD')
AND SRC.PC_DISTRIB_STATUS <> 'L' AND 1=1

The execution plan is

SQL> select * from table(dbms_xplan.display_cursor('4t2vxphsf60na','',''));
...
Plan hash value: 2522063293
---------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                  |       |       |  2067K(100)|          |
|   1 |  LOAD TABLE CONVENTIONAL | PS_PROJ_RES_TA1  |       |       |            |          |
|*  2 |   TABLE ACCESS FULL      | PS_PROJ_RESOURCE |   795K|   402M|  2067K  (1)| 00:01:21 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("SRC"."PC_DISTRIB_STATUS"<>'L' AND "SRC"."ACCOUNTING_DT"<=TO_DATE('
              2023-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              (INTERNAL_FUNCTION("SRC"."BI_DISTRIB_STATUS") OR "SRC"."CST_DISTRIB_STATUS"='N') AND
              "SRC"."ACCOUNTING_DT">=TO_DATE(' 2012-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SRC"."SYSTEM_SOURCE"<>'PRC' AND "SRC"."SYSTEM_SOURCE"<>'PRP' AND
              "SRC"."SYSTEM_SOURCE"<>'PRR' AND "SRC"."BUSINESS_UNIT"='MDGRN'))

The stats on the tables are:

SQL> select table_name, num_rows, last_analyzed from dba_tables where table_name in ('PS_PROJ_RES_TA1','PS_PROJ_RESOURCE');

TABLE_NAME          NUM_ROWS LAST_ANALYZED
----------------- ---------- -----------------
PS_PROJ_RESOURCE    97603367 20230122 09:53:58
PS_PROJ_RES_TA1     32411900 20230126 22:49:21

Although table PS_PROJ_RES_TA1 is temporary (has volatile data) and should not have stats (so to leave it to dynamic sampling), I don't think having stats on this one negatively affected this specific SQL.

Let's see if the columns referenced in the where-clause are unique. Since the tables are big, select count(distinct column) will take a while so just check data dictionary:

SQL> select column_name, data_type, num_distinct from dba_tab_columns where owner='SYSADM' and table_name='PS_PROJ_RESOURCE' and column_name in ('CST_DISTRIB_STATUS','BI_DISTRIB_STATUS','SYSTEM_SOURCE','BUSINESS_UNIT','ACCOUNTING_DT','PC_DISTRIB_STATUS');

COLUMN_NAME          DATA_TYPE  NUM_DISTINCT
-------------------- ---------- ------------
BUSINESS_UNIT        VARCHAR2              1
ACCOUNTING_DT        DATE               3953
PC_DISTRIB_STATUS    VARCHAR2
SYSTEM_SOURCE        VARCHAR2             15
BI_DISTRIB_STATUS    VARCHAR2              5
CST_DISTRIB_STATUS   VARCHAR2

Two columns are missing num_distinct values in data dictionary but judging by the names, the values should be low. Only ACCOUNTING_DT may be fairly unique. But since in the where-clause this date range is from 9/1/2012 to 8/31/2023, the range may be too wide for an index on this column (if exists) to be useful. In short, there's no good way to optimize this SQL. The only option is to run the select in parallel, i.e. change the SQL to

INSERT INTO... SELECT /*+ parallel(src,8) */ 15631038,... FROM PS_PROJ_RESOURCE SRC ...

Result: After adding the parallel hint, the SQL can finish in 747 seconds, compared to previously 3685 seconds.

Case 9: "access" vs. "filter" in predicate information and another case of ineffective index

SQL> select dbms_sqltune.report_sql_monitor('3ftsq0b0fg9zd') from dual;
...
UPDATE PS_GM_PRJ_AUDIT SET GM_REVERSE = 'D' WHERE GM_REVERSE <> 'D' AND EXISTS ( SELECT 'X' FROM PS_GM_RFA_TA54 T WHERE ...
...
SQL Plan Monitoring Details (Plan Hash Value=2771195138)
==============================================================================================================================================================================================
| Id   |                Operation                |      Name       |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Mem | Activity |        Activity Detail        |
|      |                                         |                 | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |     |   (%)    |          (# samples)          |
==============================================================================================================================================================================================
| -> 0 | UPDATE STATEMENT                        |                 |         |       |      5618 |   +166 |     1 |        0 |      |       |   . |          |                               |
| -> 1 |   UPDATE                                | PS_GM_PRJ_AUDIT |         |       |      5618 |   +166 |     1 |        0 |      |       |   . |          |                               |
| -> 2 |    FILTER                               |                 |         |       |      5618 |   +166 |     1 |     3508 |      |       |   . |          |                               |
| -> 3 |     NESTED LOOPS                        |                 |      49 | 29200 |      5618 |   +166 |     1 |     3508 |      |       |   . |          |                               |
| -> 4 |      NESTED LOOPS                       |                 |    7104 | 29200 |      5780 |     +4 |     1 |     3508 |      |       |   . |          |                               |
| -> 5 |       SORT UNIQUE                       |                 |   20626 |   479 |      5780 |     +4 |     1 |     2104 |      |       | 1MB |          |                               |
|    6 |        TABLE ACCESS FULL                | PS_GM_RFA_TA54  |   20626 |   479 |         1 |     +4 |     1 |    22603 |      |       |   . |          |                               |
| -> 7 |       INDEX RANGE SCAN                  | PSAGM_PRJ_AUDIT |       1 |     3 |      5783 |     +1 |  2104 |     3508 | 209K |   2GB |   . |    99.93 | gc cr block lost (2)          |
|      |                                         |                 |         |       |           |        |       |          |      |       |     |          | gc cr grant 2-way (101)       |
|      |                                         |                 |         |       |           |        |       |          |      |       |     |          | gc cr grant busy (12)         |
|      |                                         |                 |         |       |           |        |       |          |      |       |     |          | Cpu (5518)                    |
|      |                                         |                 |         |       |           |        |       |          |      |       |     |          | db file sequential read (136) |
| -> 8 |      TABLE ACCESS BY INDEX ROWID        | PS_GM_PRJ_AUDIT |       1 |     4 |      5618 |   +166 |  3508 |     3508 |      |       |   . |          |                               |
|    9 |     TABLE ACCESS BY INDEX ROWID BATCHED | PS_GM_PRJ_ERR   |       1 |    44 |           |        |   702 |          |      |       |   . |          |                               |
|   10 |      INDEX SKIP SCAN                    | PS_GM_PRJ_ERR   |       1 |    43 |      2568 |  +1353 |   702 |        0 |      |       |   . |     0.07 | Cpu (4)                       |
==============================================================================================================================================================================================

The most time-consuming step 7 has this predicate info:

SQL> select * from table(dbms_xplan.display_cursor('3ftsq0b0fg9zd','',''));
...

   7 - access("T"."BUSINESS_UNIT_PC"="PS_GM_PRJ_AUDIT"."BUSINESS_UNIT_PC" AND
              "T"."PROJECT_ID"="PS_GM_PRJ_AUDIT"."PROJECT_ID" AND
              "T"."ACTIVITY_ID"="PS_GM_PRJ_AUDIT"."ACTIVITY_ID")
       filter("T"."RESOURCE_ID"=SUBSTR("PS_GM_PRJ_AUDIT"."RESOURCE_ID",1,39)||'R')

It is scanning this index

SQL> select column_name, column_position, table_name from dba_ind_columns where index_name='PSAGM_PRJ_AUDIT' order by 2;

COLUMN_NAME       COLUMN_POSITION TABLE_NAME
----------------- --------------- ----------------
BUSINESS_UNIT_PC                1 PS_GM_PRJ_AUDIT
PROJECT_ID                      2 PS_GM_PRJ_AUDIT
ACTIVITY_ID                     3 PS_GM_PRJ_AUDIT
RESOURCE_ID                     4 PS_GM_PRJ_AUDIT
ANALYSIS_TYPE                   5 PS_GM_PRJ_AUDIT

Column uniqueness is:

SQL> select column_name, num_distinct from dba_tab_columns where table_name='PS_GM_PRJ_AUDIT' and column_name in ('BUSINESS_UNIT_PC','PROJECT_ID','ACTIVITY_ID','RESOURCE_ID','ANALYSIS_TYPE');

COLUMN_NAME               NUM_DISTINCT
------------------------- ------------
ANALYSIS_TYPE                       13
BUSINESS_UNIT_PC                     2
PROJECT_ID                       18516
ACTIVITY_ID                         46
RESOURCE_ID                   52503114

Table row count is:

SQL> select num_rows, last_analyzed from dba_tables where table_name='PS_GM_PRJ_AUDIT';

  NUM_ROWS LAST_ANALYZED
---------- -----------------
  98427650 20230129 08:19:26

The most unique column is resource_id. But according to the above predicate info, the uniqueness of this column is not being used, because it appears in "filter", not "access", i.e. not before but after the table data are already fetched. Other columns are not at all unique.

Suggestion: Make the index invisible or drop it. (An index with relatively non-unique column as the leading column is hardly useful.) Or see if this code SUBSTR("PS_GM_PRJ_AUDIT"."RESOURCE_ID",1,39)||'R' can be rewritten to not prefix the column with a function such as substr.

Case 10: NOT IN or MINUS operation on many columns of a table without PK

SELECT A.protocol_no, A.acc_no, TRUNC(A.reg_date) reg_date, TRUNC(A.off_study_date) off_study_date
FROM ctds_owner.registration_view A, -- 6/23/22
     cris_owner.pt_id_system_assoc B
WHERE A.patient_no = B.mrn
     AND B.pt_id = 2104637
     AND B.system_id = 1
GROUP BY A.protocol_no,
     A.acc_no, TRUNC(A.reg_date), TRUNC(A.off_study_date);

in which the view registration_view is defined as

SELECT [50+ columns] FROM ctds_owner.REGISTRATION_TABLE
MINUS
SELECT [the same set of columns]
FROM
 (SELECT * FROM ctds_owner.REGISTRATION_TABLE WHERE DATA_SOURCE_ID='C') C
 INNER JOIN
 (SELECT * FROM ctds_owner.REGISTRATION_TABLE WHERE DATA_SOURCE_ID='O') O
 ON C.PROTOCOL_NO=O.PROTOCOL_NO AND C.PATIENT_NO=O.PATIENT_NO

Its execution plan is

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                    |     1 |   104 |       |   143K  (1)| 00:00:06 |       |       |
|   1 |  HASH GROUP BY                |                    |     1 |   104 |       |   143K  (1)| 00:00:06 |       |       |
|   2 |   NESTED LOOPS                |                    |     1 |   104 |       |   143K  (1)| 00:00:06 |       |       |
|   3 |    TABLE ACCESS BY INDEX ROWID| PT_ID_SYSTEM_ASSOC |     1 |    18 |       |     3   (0)| 00:00:01 |       |       |
|*  4 |     INDEX UNIQUE SCAN         | PK_PT_IDSYS_ASSOC  |     1 |       |       |     2   (0)| 00:00:01 |       |       |
|*  5 |    VIEW                       | REGISTRATION_VIEW  |     1 |    86 |       |   143K  (1)| 00:00:06 |       |       |
|   6 |     MINUS                     |                    |       |       |       |            |          |       |       |
|   7 |      SORT UNIQUE              |                    |   998K|   381M|   520M|            |          |       |       |
|   8 |       PARTITION LIST ALL      |                    |   998K|   381M|       | 15856   (1)| 00:00:01 |     1 |     4 |
|   9 |        TABLE ACCESS FULL      | REGISTRATION_TABLE |   998K|   381M|       | 15856   (1)| 00:00:01 |     1 |     4 |
|  10 |      SORT UNIQUE              |                    |  2395 |   984K|  1080K|            |          |       |       |
|* 11 |       HASH JOIN RIGHT SEMI    |                    |  2395 |   984K|    15M| 42123   (1)| 00:00:02 |       |       |
|  12 |        PARTITION LIST ALL     |                    |   504K|    10M|       | 15851   (1)| 00:00:01 |     1 |     4 |
|* 13 |         TABLE ACCESS FULL     | REGISTRATION_TABLE |   504K|    10M|       | 15851   (1)| 00:00:01 |     1 |     4 |
|  14 |        PARTITION LIST ALL     |                    |   494K|   188M|       | 15854   (1)| 00:00:01 |     1 |     4 |
|* 15 |         TABLE ACCESS FULL     | REGISTRATION_TABLE |   494K|   188M|       | 15854   (1)| 00:00:01 |     1 |     4 |
----------------------------------------------------------------------------------------------------------------------------
...
Statistics
----------------------------------------------------------
        458  recursive calls
         35  db block gets
     173134  consistent gets
     115457  physical reads
          0  redo size
        664  bytes sent via SQL*Net to client
        794  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          1  sorts (disk)
          4  rows processed

According to dbms_sqltune.report_sql_monitor (not shown here), line 7 consumes most of the time, due to 520 MB temp segment I/O. This I/O is the major reason for the 115457 physical reads, or the only reason after the table's data is cached in buffer cache; temp segment I/O for sorting or hashing will not be cached.

Solution: Add a surrogate primary key to table registration_table and redefine the view to do substraction on this key instead of the 50+ columns (reg_id is the PK column):

create view reg_view as
select * from reg where reg_id not in
(select c.reg_id from (
(SELECT * FROM reg WHERE DATA_SOURCE_ID='C') C
INNER JOIN
(SELECT * FROM reg WHERE DATA_SOURCE_ID='O') O
ON C.PROTOCOL_NO=O.PROTOCOL_NO AND C.PATIENT_NO=O.PATIENT_NO));

Result: The two sortings are completely eliminated. So are physical reads (after table data are cached). Consistent gets also comes down, from 173134 to 115558. The SQL finishes almost instantly instead of after 20+ seconds.

Case 11: Old inefficient SQL plan baseline should not be used

SQL> select dbms_sqltune.report_sql_monitor('fn3kbyn8kjvjd') from dual;
...
SQL Text
------------------------------
SELECT DISTINCT B.BUSINESS_UNIT ... FROM PS_JRNL_LN B, PS_JRNL_HEADER A, PS_DEPT_TBL C, PS_FUND_TBL D, PS_OPER_UNIT_TBL E, PS_GL_ACCOUNT_TBL F,
PS_SET_CNTRL_TBL G, PS_CHARTFIELD1_TBL H WHERE ( A.BUSINESS_UNIT = B.BUSINESS_UNIT AND ...
...
Binds
========================================================================================================================
| Name | Position |      Type      |                                       Value                                       |
========================================================================================================================
| :1   |        1 | VARCHAR2(2000) | MDACC                                                                             |
| :2   |        2 | VARCHAR2(2000) | 2023                                                                              |
| :3   |        3 | VARCHAR2(2000) | 5                                                                                 |
| :4   |        4 | VARCHAR2(2000) | 6                                                                                 |
| :5   |        5 | VARCHAR2(2000) | 11303                                                                             |
| :6   |        6 | VARCHAR2(2000) | 11303                                                                             |
========================================================================================================================
...
SQL Plan Monitoring Details (Plan Hash Value=2903209619)
============================================================================================================================================================================================
| Id    |                     Operation                      |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity |      Activity Detail       |
|       |                                                    |                    | (Estim) |      | Active(s) | Active |       | (Actual) |       |   (%)    |        (# samples)         |
============================================================================================================================================================================================
...
| -> 29 |            INDEX RANGE SCAN                        | PSHJRNL_LN         |       1 |    1 |      3903 |     +5 |    3M |       4G |     . |    12.95 | Cpu (505)                  |
| -> 30 |           TABLE ACCESS BY INDEX ROWID              | PS_JRNL_LN         |       1 |    1 |      3907 |     +1 |    4G |     115K |     . |    81.79 | gc current block 2-way (4) |
|       |                                                    |                    |         |      |           |        |       |          |       |          | Cpu (3185)                 |
...

SQL> select * from table(dbms_xplan.display_cursor('fn3kbyn8kjvjd','',''));
...
Note
-----
   - SQL plan baseline SQL_PLAN_bhxmvdfgrn0fbb8a50f23 used for this statement
   - statistics feedback used for this statement

The index PSHJRNL_LN is indeed not efficient judging by value uniqueness in the columns of the index (values in dba_ind_columns and dba_tab_columns not shown here). But there's a more glaring anomaly: it uses a very old SQL plan baseline according to dba_sql_plan_baselines. Let's try disabling it:

SQL> var ret number
SQL> exec :ret := dbms_spm.alter_sql_plan_baseline(plan_name=>'SQL_PLAN_bhxmvdfgrn0fbb8a50f23', attribute_name=>'ENABLED', attribute_value=>'NO')

PL/SQL procedure successfully completed.

Result: The user submitted the query again and it finished in a few minutes.

Comments:
(1) If disabling the plan baseline had not improved the performance, it would be worthwhile to make the inefficient index PSHJRNL_LN invisible (or drop it). This is an index created in-house, not delivered by Oracle PeopleSoft.
(2) One other improvement we can have is to match the runtime values passed to the bind variables. In the BINDS section, we see all values are passed in as VARCHAR2. But several columns are defined as DATE and NUMBER. Inadvertent type conversion may cause CBO to choose a bad plan although in this case we're probably lucky.
(One other peculiar thing about the plan baseline is that its origin is 'AUTO-CAPTURE'. We're not sure if optimizer_capture_sql_plan_baselines was ever enabled.)

Case 12: Short of CPU resources

User found that the same SQL ran faster on non-prod than on prod. The execution plan is the same. The stats on prod are:

=================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes |
=================================================================================
|    2210 |     862 |     0.57 |        0.20 |     1347 |    68M |  273 | 220MB |
=================================================================================

The big discrepancy between elapsed time and cpu time could be due to non-CPU consuming waits, such as very slow I/O (N/A, as IO Waits is low), waiting for locks (N/A, Concurrency Waits is low; checked perfmon.log and found no waits for locks such as TX).

On non-prod, the elapsed time is only slightly greater than CPU time:

===================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
===================================================================
|     436 |     431 |     3.29 |     1.71 |    68M | 5726 |  49MB |
===================================================================

It turned out that this SQL was run when the prod server was doing a parallel degree 6 data pump dump job, which consumed a large amount of CPU. The job was restricted to run on only one node (for unknown reasons) and this SQL happened to be running on this node, too. On non-prod, the session made full use of the server CPU.

Suggested solution: Run the SQL at a different time than the data pump job or move the data pump job to a different time; run the SQL on the node data pump is not running on; remove "cluster=no" from expdp command or reduce its degrees. Why a job only ran on one node of the RAC database should also be checked and corrected.

Case 13: Cartesian join

One SQL has been running for hours:

SELECT ...
FROM PSAUDIT A, PS_VNDR_BANK_ACCT B, PS_VENDOR C, PS_ROLEXLATOPR D
WHERE (
  B.EFFDT BETWEEN TO_DATE('2022-09-01','YYYY-MM-DD') AND TO_DATE('2099-12-31','YYYY-MM-DD')
  AND B.SETID = C.SETID AND B.VENDOR_ID = C.VENDOR_ID AND D.ROLEUSER = A.AUDIT_OPRID)
ORDER BY 9, 14

The execution plan shows MERGE JOIN CARTESIAN. A cartesian join is usually caused by not joining all tables. In this case, although there are joins between tables B and C, D and A, there is no join between these two sets of tables. A cartesian join due to insufficient joins between the tables in the from-list should be avoided unless the joined table definitely only has 1 row (or is empty), or the business truly requires a display of data pairing up all the rows of a cartesian joined table with no restriction, which is extremely rare.

A less obvious case is a branched-off or-clause, e.g. (from Oracle Doc 1429323.1)

select Distinct t1.*
from t1, t2, t3, T4
where t3.siteId = 123
and t3.t3id = 123
and t3.t3Id = t2.t3id
and t2.t2id = t1.t2Id
and ( T4.prdinstid = t1.prdinstid or t1.prdinstid is null);

Although the last line shows that T4 and t1 join, this line also has an or-clause which branches off. That is, the orange part combined with the blue is good. But the orange and the green also need to be combined (and its result will be union'ed to the orange-blue combination), and this orange-green combined part *will* create a cartesian join because there're not enough joins compared to the from-list, which lists 4 tables.

Appendix 1: Manually run SQL Tuning Advisor on a given SQL

Either in conjunction with the above tuning exercise or after failing them all, you can run SQL Tuning Advisor to see if you can get any helpful recommendation. This tool is particularly helpful in case the SQL is complicated and a SQL profile or a new index is needed to improve its performance. You can use Oracle Enterprise Manager web interface to run the advisor, or follow the steps below to run it manually. The example below assumes the bad SQL has an ID of 54v10rhv66pgz. Text in bold is essential commands.

SQL> select dbms_sqltune.create_tuning_task(sql_id=>'54v10rhv66pgz',task_name=>'Tune_54v10rhv66pgz') from dual; <--better to specify a task name meaningful to you

DBMS_SQLTUNE.CREATE_TUNING_TASK(SQL_ID=>'54V10RHV66PGZ',TASK_NAME=>'TUNE_54V10RHV66PGZ')
----------------------------------------------------------------------------------------
Tune_54v10rhv66pgz

SQL> select * from dba_advisor_log where task_name='Tune_54v10rhv66pgz'; <-- status should be 'INITIAL'

SQL> exec dbms_sqltune.execute_tuning_task('Tune_54v10rhv66pgz') <-- this will take some time, up to 30 min (default timeout duration)

If you want it to run longer, say up to 1 hour, precede the above with these two SQLs:

SQL> exec dbms_sqltune.set_tuning_task_parameter('Tune_54v10rhv66pgz','local_time_limit',3600)
SQL> exec dbms_sqltune.set_tuning_task_parameter('Tune_54v10rhv66pgz','time_limit',3600)

If you watch from another session, select status from dba_advisor_log where task_name='Tune_54v10rhv66pgz' should show 'EXECUTING'.

Once done:

SQL> set long 1000000 longc 200 pages 1000
SQL> select dbms_sqltune.report_tuning_task('Tune_54v10rhv66pgz') from dual;

Even if creating the tuning task times out, the advisor usually can still give you good suggestions, such as a SQL profile or a few indexes, with estimated benefits if implemented. In practice, even if the benefit of a SQL profile is not high, it may still dramatically improve the SQL performance. When accepting the SQL profile, it's a good idea to add force_match=>true so that a future SQL with text that differs only in literal values and other trivial text features (spaces etc.) can still match; note: if table aliases or comments in the SQL change, the profile won't be used.

SQL> execute dbms_sqltune.accept_sql_profile(task_name=>'Tune_54v10rhv66pgz', task_owner=>user, replace=>true, force_match=>true)

If running dbms_sqltune.create_tuning_task throws error ORA-13780: SQL statement does not exist, some info about the SQL may have been aged out of shared pool; v$sqlarea may still have the SQL but v$sql_monitor or v$sqlarea_plan_hash does not. Try finding it in history:

SQL> select instance_number, snap_id from dba_hist_sqlstat where sql_id='54v10rhv66pgz' order by 1,2;

If it's found, use this syntax to create a tuning task

SQL> select dbms_sqltune.create_tuning_task(begin_snap=>1234,end_snap=>1235,sql_id=>'54v10rhv66pgz',task_name=>'Tune_54v10rhv66pgz') from dual;

If there is only one snapshot that captured the SQL, set end_snap to the snap_id but lower begin_snap by 1. If you still get ORA-13780, another possibility is that the SQL is using an adaptive plan and is still running. Either wait till its execution is completed or in SQL*Plus or SQL Developer run alter session optimizer_adaptive_plans=false and then the SQL of interest. (Ref: 2932844.1)

Appendix 2: Wrap long SQL text lines

When a SQL statement is long, the sql_text or sql_fulltext column of v$sql* views may not wrap the line or lines after a full keyword. The output of dbms_sqltune.report_sql_monitor or dbms_xplan.display_cursor does not, either. You have to manually cut and join the lines of the text in order to run explain plan on it or directly run the SQL. But this is a tedious process for a very long SQL. The following describes a quick way to achieve this.

Copy and save the SQL text into vi. If the text spans multiple lines, join them by typing command :%j!. Save the file to, say, sql.txt, and exit. On command line, type fold -s -w100 sql.txt, where 100 means you want to have the lines at most 100 characters long. Adjust that number as you like. The result is a SQL you can directly run now (apart from bind variables, if any, which you have to replace with actual runtime values or prepare tool-specific bind variables and values first). Linux comes with fold, which is part of coreutils. If you're on Windows, you can install GNU utilities for Win32.

Ideally, the SQL text should meaningfully wrap where certain words occur, e.g. select, from, where, and some parentheses, and indent appropriately on other new lines for easy reading. The above method does not do that. Suppose you want to prepend a carriage return to each of those keywords. Using selec as an example, type vi command :%s/select/^Mselect/g, where ^M means literally pressing Control and M on the keyboard if you're on Linux, but Control-Q and Control-M if on Windows.

If you're on Linux or UNIX, you can use my fmt_sql bash script to automate the above steps.

2022-03 to 2023-03