Oracle on Windows, strace.exe and Process Explorer

Serious Warning Running strace could cause stop error (BSOD) if Process Explorer is also running; you can avoid this by pausing the refresh of Process Explorer.

Prerequisites

Oracle database, strace.exe, Process Explorer (procexp.exe).

If you get this error when running strace,
For Windows XP, the registry value: HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\EnforceWriteProtection must be set to 0 (type REG_DWORD) for strace to work.
please add this registry value and set its data to 1. No reboot is needed. When you're done with the research, set it to 0 or delete it.

Documentation for Windows native system services or functions can be found at Tomasz Nowak's NTInternals.Net web site and Gary Nebbett et al.'s Windows NT/2000 Native API Reference. If there's discrepancy, Gary's book is followed.

Run strace -p <Oracle pid>. The output is like

1625 1868 2076 NtQueryInformationThread (260, 0, 49541372, 28, 0, ... ) == 0x0
Here's the explanation of each element.
1625
1868
2076
NtQueryInformationThread
(260, 0, 49541372, 28, 0, ... )
==
0x0
strace sequence#Oracle pidthread ID inside oracle.exe process (or v$process.spid in Oracle)Windows system service call (undocumented)argument list return value
You can filter on a specific service with -e switch like this: strace -e NtWriteFile -p 1868, or less efficiently by piping to grep: strace -p 1868 | grep NtWriteFile. If you want to filter on a specific thread, pipe this way: strace -p 1868 | grep " 1868 2076 ". (If you don't have grep, use find instead, which comes with modern Windows versions; remember find requires double quotes around its argument while grep requires them only if there're spaces in the search string.)


NtQueryInformationThread

As the name suggests, this system service fetches thread object information. Inside oracle.exe, it's usually PMON thread that does this (every 3 seconds). Here's an example.

1625 1868 2076 NtQueryInformationThread (260, 0, 49541372, 28, 0, ... ) == 0x0
1626 1868 2076 NtQueryInformationThread (260, 1, 49541356, 32, 0, ... ) == 0x0
1627 1868 2076 NtQueryInformationThread (440, 0, 49541372, 28, 0, ... ) == 0x0
1628 1868 2076 NtQueryInformationThread (440, 1, 49541356, 32, 0, ... ) == 0x0
1629 1868 2076 NtQueryInformationThread (476, 0, 49541372, 28, 0, ... ) == 0x0
1630 1868 2076 NtQueryInformationThread (476, 1, 49541356, 32, 0, ... ) == 0x0
1631 1868 2076 NtQueryInformationThread (492, 0, 49541372, 28, 0, ... ) == 0x0
1632 1868 2076 NtQueryInformationThread (492, 1, 49541356, 32, 0, ... ) == 0x0
1633 1868 2076 NtQueryInformationThread (512, 0, 49541372, 28, 0, ... ) == 0x0
1634 1868 2076 NtQueryInformationThread (512, 1, 49541356, 32, 0, ... ) == 0x0
1635 1868 2076 NtQueryInformationThread (532, 0, 49541372, 28, 0, ... ) == 0x0
1636 1868 2076 NtQueryInformationThread (532, 1, 49541356, 32, 0, ... ) == 0x0
2076 is PMON:
SQL> select name from v$bgprocess where paddr = (select addr from v$process where spid = 2076);

NAME
-----
PMON

For NtQueryInformationThread, the first argument is handle to the object being queried. In our case, it's file handle. Launch ProcExp and look at the lower pane when you highlight oracle.exe in the upper pane. You see handle 0x104 (260 in decimal) is a thread object with thread ID 2076; 0x214 (532 decimal) is thread 2044. They are Oracle PMON and RECO background processes, respectively. You can also save ProcExp result to a file named procexp.txt (make sure you highlight oracle.exe in the upper pane). To replace the hex file handles with decimal handles for easy reading, assuming you installed Perl, run the command below and look at oracle.txt.

perl -wnle "if(/^Process: oracle.exe/..eof()) {@a=split; if(defined $a[3]) {print hex($a[2]), \" $a[1]\"}}" procexp.txt > oracle.txt

NtQueryPerformanceCounter

1624 1868 2076 NtQueryPerformanceCounter (49541292, 49541268, ... ) == 0x0

I wish this system service was equivalnet to times() on SVR4 UNIX or gettimeofday() on BSD flavored UNIX, as a result of Oracle timed_statistics being set to true. But alter system set timed_statistics = false doesn't turn off this system service call according to the strace output; messing with timed_os_statistics doesn't change anything either. The arguments to this service are "number of processor ticks after last reset" and possibly "number of processor ticks per one second"(?).


NtWriteFile and NtReadFile

As on UNIX, tracing Oracle processes on OS mostly yields information on I/O activities, particularly those on filesystems. Here's NtWriteFile.

40 1868 2084 NtWriteFile (1576, 0, 2086263196, 1613765712, 57504932, 47948800, 1024, 56815472, 0, ...
40 1868 2084 NtWriteFile ) == 0x103
555 1868 2084 NtWriteFile (1576, 0, 2086263196, 1613765712, 57504184, 47949824, 73728, 56815472, 0, ...
555 1868 2084 NtWriteFile ) == 0x103
580 1868 2084 NtWriteFile (1576, 0, 2086263196, 1613765712, 57503436, 48023552, 2560, 56815472, 0, ... ) == 0x103
590 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55509600, 2126274560, 8192, 54587548, 0, ... ) == 0x103
591 1868 2096 NtWriteFile (816, 0, 2086263196, 1613765712, 55512020, 2122457088, 8192, 54587548, 0, ... ) == 0x103
592 1868 2096 NtWriteFile (816, 0, 2086263196, 1613765712, 55507180, 2124521472, 8192, 54587548, 0, ... ) == 0x103
593 1868 2096 NtWriteFile (736, 0, 2086263196, 1613765712, 55508720, 2130018304, 8192, 54587548, 0, ... ) == 0x103
594 1868 2096 NtWriteFile (736, 0, 2086263196, 1613765712, 55511140, 2124587008, 8192, 54587548, 0, ... ) == 0x103
595 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55509160, 2130206720, 8192, 54587548, 0, ... ) == 0x103
596 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55510040, 2130190336, 8192, 54587548, 0, ... ) == 0x103
597 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55506740, 2130198528, 8192, 54587548, 0, ... ) == 0x103
598 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55511360, 2126143488, 8192, 54587548, 0, ... ) == 0x103
599 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55505200, 2124398592, 8192, 54587548, 0, ... ) == 0x103
600 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55510700, 2128347136, 8192, 54587548, 0, ... ) == 0x103
601 1868 2096 NtWriteFile (744, 0, 2086263196, 1613765712, 55504980, 2125783040, 8192, 54587548, 0, ... ) == 0x103
653 1868 2088 NtWriteFile (624, 0, 2086263196, 1613765712, 59276484, 59268096, 8192, 58650388, 0, ... ) == 0x103
654 1868 2088 NtWriteFile (636, 0, 2086263196, 1613765712, 59276668, 59268096, 8192, 58650388, 0, ... ) == 0x103
655 1868 2088 NtWriteFile (648, 0, 2086263196, 1613765712, 59276852, 59268096, 8192, 58650388, 0, ... ) == 0x103

Some lines are broken into two lines by strace (see sequences 40 and 555 e.g.). Join them based on the sequence number (first column). The first eight NtWriteFile arguments are file handle, event handle, two "asynchronous procedure call"-related numbers, I/O result, buffer address, buffer length, byte offset from file beginning. Again, we can find the files from procexp, which in this instance happen to be

1576, 0x628, REDO03.LOG
744,  0x2E8, UNDOTBS01.DBF
816,  0x330, USERS01.DBF
736,  0x2E0, SYSTEM01.DBF
624,  0x270, CONTROL01.CTL
636,  0x27C, CONTROL02.CTL
648,  0x288, CONTROL03.CTL
Note that the buffer size is always db_block_size (8192) for datafiles but could be 1024, 73728 or 2560 for redo logs. Controlfile write may be in db_block_size chunks if Oracle port to Windows does not differ from UNIX (the UNIX $ORACLE_HOME/bin/dbfsize command tells us the controlfile block size matches db_block_size).

These are NtReadFile calls and the three file handles.
10 1868 2088 NtReadFile (628, 0, 0, 58650088, 58650088, 66388480, 8192, 58649996, 0, ... ) == 0x0
13 1868 2088 NtReadFile (640, 0, 0, 58650088, 58650088, 66388480, 8192, 58649996, 0, ... ) == 0x0
16 1868 2088 NtReadFile (652, 0, 0, 58650088, 58650088, 66388480, 8192, 58649996, 0, ... ) == 0x0

628, 0x274, CONTROL01.CTL
640, 0x280, CONTROL02.CTL
652, 0x28C, CONTROL03.CTL

It looks like each controlfile has two handles for the process, one for reading, the other for writing.

2982 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x0
3152 1868 2004 NtReadFile (2288, 0, 0, 143822604, 143822604, 2130362368, 8192, 143822512, 0, ... ) == 0x0
3208 1868 2004 NtReadFile (2288, 0, 0, 143821740, 143821740, 2120597504, 8192, 143821648, 0, ... ) == 0x0
3216 1868 2004 NtReadFile (2288, 0, 0, 143824696, 143824696, 2130509824, 8192, 143824604, 0, ... ) == 0x0
3478 1868 2004 NtReadFile (2288, 0, 0, 143836364, 143836364, 2120630272, 8192, 143836272, 0, ... ) == 0x0
3486 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144523966, 2064, 0, 0, ... ) == 0x0
3505 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x103

That was generated when thread or Oracle spid 2004 is full scanning a table (in USERS tablespace). Handle 1240 is \Device\NamedPipe\ORANTP74C.7D4. Interestingly, procexp still considers this of type File. Note the buffer size is 2K, not 8K. 2288 is SYSTEM01.DBF.

If we full scan the table again, the thread does not read datafiles any more; it must be reading Oracle buffer cache. strace shows that it's reading many 2k chunks out of the named pipe:

55 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144523966, 2064, 0, 0, ... ) == 0x0
72 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x0
219 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144523966, 2064, 0, 0, ... ) == 0x0
246 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x0
260 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144523966, 2064, 0, 0, ... ) == 0x0
277 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x0
287 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144523966, 2064, 0, 0, ... ) == 0x0
304 1868 2004 NtReadFile (1240, 0, 0, 0, 143842904, 144516342, 2064, 0, 0, ... ) == 0x103

LGWR

The following is LGWR (2084). According to Nowak's documentation, 56814572 is how long in microsecond the thread delays and 56813886 is just a pointer to the thread context structure. But that delay of 57 seconds seems awfully long. Nebbett's book says the unit is 100-nanosecond. So the delay here is 5.7 seconds. In fact, those lines are shown almost at the same time in strace output, apparently not with a delay of 5 or 6 seconds. But the first argument value 1 indicates the delay is interruptible. So the delay may have simply been interrupted by something else almost immediately.

904 1868 2084 NtDelayExecution (1, 56814572, ... ) == 0xc0
905 1868 2084 NtContinue (56813836, 1, ...
906 1868 2084 NtContinue (56813836, 1, ...
907 1868 2084 NtDelayExecution (1, 56814572, ... ) == 0xc0
908 1868 2084 NtContinue (56813836, 1, ...

strace missing -c

On UNIX or Linux, truss or strace has an option -c for you to just get a summary of system calls and how many times each call is made. Windows strace doesn't have this option. But if you have Perl installed, you can pipe the strace output to a perl one-liner and press Control-C when you're ready to see the result:

c:\>strace -p 1996 | perl -wnle "$SIG{INT}=sub{}; chomp; /^(?:\d+\*?) (?:\d+) (?:\d+) (\S+)/; ++$h{$1}; END {foreach (keys %h) {print \"$_\t$h{$_}\"}}"
NtWriteFile     9
NtAllocateVirtualMemory 1
NtContinue      6
NtWaitForSingleObject   10
NtQueryPerformanceCounter       30
N^CtQueryInformationThread      29
NtReadFile      2
NtFreeVirtualMemory     2
NtWaitForMultipleObjects        2
NtDelayExecution        6

^C is shown in the middle. Here the oracle.exe process has PID 1996 and most of the calls are querying about performance and general thread information. If you want to limit the tracing to a specific thread or Oracle spid say 884:

c:\>strace -p 1996 | perl -wnle "$SIG{INT}=sub{}; chomp; ++$h{$1} if /^(?:\d+\*?) (?:\d+) (?:884) (\S+)/; END {foreach (keys %h) {print \"$_\t$h{$_}\"}}"

NtSetEvent      3
NtWriteFile     2
NtFsControlFile 4
NtAllocateVirtualMemory 2
NtWaitForSingleObject   2
^CNtCreateFile  2
NtQueryPerformanceCounter       75
NtQueryInformationThread        247
NtReadFile      40
NtFreeVirtualMemory     2

That was gathered for an Oracle session simply doing select count(*) from a small table in a few second interval. Lastly, if you need to sort the system service calls, change (keys %h) to (sort keys %h).


LOCK_SGA

Oracle parameter LOCK_SGA "locks the entire SGA into physical memory... is ignored on platforms that do not support it". Does Windows support it? Due to a problem described in Note:274092.1, it's fair to say LOCK_SGA was not correctly supported on Windows in the past. But this doesn't stop us from exploring the effect of setting this parameter on the oracle.exe process. My earlier test with 10g shows that this parameter increased oracle.exe process working set size on database startup and this test was later reproduced with 9i. In the test below, we'll use strace to see exactly what this parameter has done.

This is Oracle 9.2.0.1.0 on XP. If LOCK_SGA is set to true and PRE_PAGE_SGA is not set to true (see Heini Nolsoe's message for the latter requirement), running strace on oracle.exe while the database is starting up returns the following lines:

c:\>strace -p 3120 | grep -i lock
[snipped]
2588 3120 2260 NtLockVirtualMemory (-1, (0x3070000), 454656, 1, ... (0x3070000), 454656, ) == 0x0
2590 3120 2260 NtLockVirtualMemory (-1, (0x78400000), 92274688, 1, ... (0x78400000), 92274688, ) == 0x0
2592 3120 2260 NtLockVirtualMemory (-1, (0x30e0000), 667648, 1, ... (0x30e0000), 667648, ) == 0x0
[snipped]

This says that the Windows system service NtLockVirtualMemory, which implements the documented VirtualLock Win32 function, locks virtual memory starting at the above three hex addresses of size 454656, 92274688 and 667648 bytes, respectively. The LockType 1 indicates that it tries to lock in working set list, not physical memory (which would be 2). Return code is success.

When the database is starting, sqlplus shows:

Total System Global Area   93395628 bytes
Fixed Size                   453292 bytes
Variable Size              75497472 bytes
Database Buffers           16777216 bytes
Redo Buffers                 667648 bytes
suggesting that Oracle locks fixed area of SGA plus some overhead (454656-453292=1364), SGA minus fixed area and redo log buffer (93395628-453292-667648=92274688), and then log buffer (667648), in that order.

When I set LOCK_SGA to false or don't set it, database startup does not call NtLockVirtualMemory.

Interpretation of NtLockVirtualMemory is based on Gary Nebbett's Windows NT/2000 Native API Reference, pp.87-88 for ZwLockVirtualMemory. LockType 1 for lock in working set list vs. 2 for lock in physical memory is not explained. I believe VirtualLock always calls NtLockVirtualMemory with LockType 1, consistent with the usual caveat that "locked" memory pages can still be paged out if there's no thread of the process running on the CPU. Nevertheless, we see clear difference between setting and not setting LOCK_SGA in Oracle on Windows. The problem of this parameter in older versions of Oracle may have been corrected, but it's better to get confirmation from Oracle support.

To my Computer Page