I originally wrote this program for Solaris when I realized top (or prstat) couldn't sort on I/O (history). Now I have it for other OS'es. For Solaris, read on until the end of purple background. For Windows, click here. For HP-UX, you can jump here but I recommend you glance through the Solaris section first. For Linux, click here.
For instance, the following command shows that your current shell has read and written 10994 characters so far (we'll talk about other columns later).
$ pio -p $$ PID InpBlk OutpBlk RWChar MjPgFlt Comm 392 0 33 10994 0 -ksh |
$ pio -p 554 PID InpBlk OutpBlk RWChar MjPgFlt Comm 554 991 0 116099 922 find / $ while true; do > pio -Hp 554 > sleep 1 > done 554 2266 0 243623 2095 find / 554 2339 0 251878 2166 find / 554 2408 0 258030 2229 find / ^C$ |
$ pio -A #columns are pid, InpBlk, OutpBlk, RWChar, MjPgFlt, Comm 0 280 1 0 93 sched 0 280 1 0 93 sched 0 280 1 0 93 sched 1 125 1 68121 92 /etc/init - 2 0 0 0 0 pageout 3 0 210 0 0 fsflush 342 12 4 8691 11 /usr/lib/saf/sac -t 300 ... |
The next useful thing to do is write a program to sort the RWChar column. I wrote a Perl script specifically for this purpose, appropriately named topio.
$ topio ** WARNING: Running topio without -d may not be ** ** what you want. Type topio -h for help. ** PID InpBlk OutpBlk RWChar MjPgFlt Command 338 530 4 2394164 408 /usr/openwin/bin/Xsun :0 -nobanner -defdepth 24 -auth /var/dt/A:0-oHaGLa 368 96 5 2027102 86 /usr/lib/ssh/sshd 363 188 0 834232 163 dtgreet -display :0 348 196 0 679939 147 /usr/sfw/sbin/snmpd 388 0 4 291163 0 /usr/lib/ssh/sshd 214 7 0 128703 7 /usr/sbin/inetd -s 350 50 1 84051 40 /usr/dt/bin/dtlogin -daemon 1 125 1 68121 92 /etc/init - 247 0 3 57536 0 /usr/lib/utmpd 314 8 0 34965 4 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf ^C$ |
Probably the most useful of pio and topio is the -d option of topio, which sorts based on the delta or difference of the process Read/Write Characters between two consecutive runs. (While the examples above are run on my laptop, the screen shot below is captured on a server so the numbers differ.)
$ topio -d -s2 -n5 #display 5 top Delta-I/O processes every 2 seconds --PID-------RWChar-----DltRWC-----MjPgFlt-DltMPF Command------------------------ 8872 64025286 1835008 7 0 ora_dbw0_ORATEST 5945 289675626 56832 62 0 ora_lgwr_ORATRN 5947 497918441 49152 324 0 ora_ckpt_ORATRN 5773 3917910706 28672 47 0 ora_lgwr_INTTST 5943 3609392512 16384 1 0 ora_dbw0_ORATRN --PID-------RWChar-----DltRWC-----MjPgFlt-DltMPF Command------------------------ 8874 3130856681 2108416 112 0 ora_lgwr_ORATEST 18528 11064724 831589 0 0 oracleORATEST 5945 289729898 54272 62 0 ora_lgwr_ORATRN 5775 1537267122 49152 223 0 ora_ckpt_INTTST 8876 302708422 16384 241 0 ora_ckpt_ORATEST --PID-------RWChar-----DltRWC-----MjPgFlt-DltMPF Command------------------------ 8872 68752070 4726784 7 0 ora_dbw0_ORATEST 8874 3132257001 1400320 112 0 ora_lgwr_ORATEST 18528 11361185 296461 0 0 oracleORATEST 18526 48811015 178640 113 0 oracleORATEST 5775 1537381810 114688 223 0 ora_ckpt_INTTST ^C$ |
Process 5773 has the highest absolute I/O's under RWChar column according to topio output (without -d, not shown here). But its delta I/O, difference of absolute I/O's between two consecutive runs, only shows up near the top occasionally. This process happens to be an Oracle background process LGWR which writes to the redo logfiles of INTTST database. This LGWR process at the time we captured wrote 28672 bytes to logfiles in a 2 second period. (LGWR does not read, unless the database is being recovered from crash.) If you're only checking Oracle processes' I/O, you may want to supplement this information with that offered by Oracle's tools such as the statistics collected in Oracle v$sess_io view. (Unfortunately v$sess_io doesn't record physical writes.)
Download source code pio.c and type gcc -o pio pio.c. Also download topio and read the line below #!. Put pio and topio in the same directory and chmod to make executable. If you wish to run topio from directories other than where they are, change $PIO in topio to the absolute path. The current version additionally probes the process major page fault in the hope that true disk I/O excluding page cache I/O can be deduced. Note for x86 Solaris users: gcc 3 has problems with some headers. Use gcc 2.95 instead, unless you want to fix the header files.
How does it work? Before Solaris 10,note1 there're two ways to get the I/O count of a process on Solaris. Brendan Gregg's psio Perl program uses the prex utility to probe into kernel and filter on a specific process. My pio, originally written by looking at Jim Mauro and Richard McDougall's msacctnote2 published in Appendix C of Solaris Internals, fetches the I/O count from /proc filesystem. (I'm not using microstate accounting as in Jim's program, which is essential in CPU costing but would pose some performance overhead.) Basically, pio gets process I/O statistics from /proc/pid/usage, specifically the fields pr_inblock, pr_oublock and pr_ioch of struct prusage, as explained on pp.314-5 of Solaris Internals and proc(4) man page. You may wonder how much precious information is collected by our UNIX box without ever being used! That's right. If you don't write programs like this to fetch the data, they're collected and simply thrown away.
What the numbers mean pr_inblock and pr_oublock are generally not very useful. According to Adrian Cockcroft, "inblock and outblock [sic] counters are uninteresting as they only refer to filesystem metadata for the old-style buffer cache". Indeed, beginning with Solaris 2, the old buffer cache is largely replaced by page cache and is only used to store metadata. So if you see occasional number jump in InpBlk and OutpBlk, it is, for instance, because the allocated file blocks needs to be extended/shrunk to accommodate more/less data, so the inode is updated. What I observed is, when a process continuously does I/O, RWChar keeps increasing. InpBlk and OutpBlk remain the same for some time and suddenly jump, remain the same for a while again and jump again. But the ratio of this jump in blocks to the number of characters incremented in RWChar is not consistent for each file. That's why the 2nd and 3rd columns of pio output don't look important to me.
The statistic pr_ioch or Read/Write Characters lumps reads and writes together and there's no way to separate them. The only workaround I can think of is something like
#trace read/write syscalls, redirect stderr (which truss outputs to) to Perl filter, #which prints syscall return value, i.e. number of chars read/written truss -t read,pread -p pid 2>&1 | perl -nle '/= (\d+)$/; print $1' truss -t write,pwrite -p pid 2>&1 | perl -nle '/= (\d+)$/; print $1' |
Another problem with pio is that RWChar includes all kinds of I/O, i.e. disk I/O as well as terminal and network I/O. If somebody has left the top program running for a long time (because he doesn't know the lighter-weight prstat!), topio may show this top process has accumulated a lot of RWChar, and possibly a lot of delta I/O in topio -d output, particularly if top was launched with a short interval (like top -s1). You can test this problem of pio and topio with a tight loop of echo "some characters" without a sleep in the loop. The current version of my program incorporates major page fault statistic in order to hopefully uniquely identify real disk I/O. Fortunately people often use topio to monitor daemon processes including Oracle server processes. So terminal I/O is completely off. But disk I/O and network I/O (if any) are still mixed.
____________________
note1 Solaris 10 has the powerful DTrace facility which can be used to provide process I/O statistics.
note2 Jim Mauro's msacct uses printf("%ld".. for process usage. I changed it to printf("%lu".. in pio.c. Otherwise numbers greater than 2 billion would show as negative. They're defined as unsigned long anyway.
Assuming you have quickly read the Solaris section, I only highlight a few points here. pio on HP-UX tells you how many read and write operations a process has performed. topio sorts all processes by either reads or writes.
$ pio -p $$ PID InpOps OutpOps MjPgFlt Comm 25240 8 16 0 sh $ topio -n3 -s2 -kW #display 3 top Delta-Write processes every 2 seconds --PID ProcName--------- -----Reads ---DltR -----Writs ---DltW -----PFlts ---DltF 52 vxfsd 225 0 322383 6 0 0 1700 midaemon 0 0 0 0 0 0 13730 ia64_corehw 0 0 0 0 0 0 --PID ProcName--------- -----Reads ---DltR -----Writs ---DltW -----PFlts ---DltF 52 vxfsd 225 0 322388 5 0 0 1429 java 191 0 8080 1 794 0 1700 midaemon 0 0 0 0 0 0 |
While the Solaris version lumps read and write characters together, the HP-UX version separately counts input and output, and it counts read and write operations, not number of characters. In addition, the HP-UX version no longer needs -d to sort on deltas.
Download source code pio.c and type cc -D_PSTAT64 -o pio pio.c. Also download topio and read the line below #!. Put pio and topio in the same directory and chmod to make them executable. If you wish to run topio from directories other than where they are, change $PIO in topio to the absolute path. [Dec 2008, Alexander Beyn comments "on HPUX 11.00, I had to #define _RUSAGE_EXTENDED before sys/pstat.h was included, otherwise pst_inblock and pst_oublock were not part of the pst_status structure...It looks like HP-UX 11.11 (released in 2000) and newer expose those fields without _RUSAGE_EXTENDED."]
How does it work? pio fetches I/O statistics from pstat, specifically pst_inblock and pst_oublock fields of struct pst_status. You can see these fields in /usr/include/sys/pstat/pm_pstat_body.h (thanks to Don Morris and Christof Meerwald on the newsgroup). Note that judging by the names, you would think they represent number of input and output blocks, just like pr_inblock and pr_oublock on Solaris. But the header file comment says they are block input and output operations.
Windows |
Windows Task Manager allows you to view process statistics. On 2000, XP and above, if you go to View | Select Columns, you can add I/O-related counters. There are, however, two limitations. First, Task Manager can't display processes on a remote computer. Second, the I/O counters are absolute values accumulated since process startup. The absolute values answer the question such as "What process has done the most reading in bytes or in number of times of read?" But in reality, one would ask another question more often, "What process currently is doing the most read?" My topio program answers the second question. Here's a screen shot showing top 5 processes on server 123.45.67.89 every 2 seconds sorted by delta write bytes (DltWBts column). I launched Winzip to compress some files right after I started topio.
D:\>perl d:\systools\topio.pl -m123.45.67.89 -n5 -s2 -kw --PID ProcName---- -----RBytes DltRBts --Reads DltR -----WBytes DltWBts --Writs DltW -----CBytes DltCBts ---Cntls DltC --PFlts DltF 1864 WinMgmt 24645682 16410 4973 90 5778555 48320 20850 87 448413706 0 7765109 14 1367503 257 304 SERVICES 377067071 2208 6365120 48 618084696 32980 5756074 51 49798992 510 5599270 56 82296 3 8 System 34644 0 83 0 504278961 8258 708219 12 71108078 0 4601016 11 41190 5 316 LSASS 6986726 5024 102785 10 10817344 1756 84879 9 7439645 8 185753 8 48880 25 552 svchost 562472 1157 999 4 265419 1286 605 2 140400 0 3367 6 3141 1 --PID ProcName---- -----RBytes DltRBts --Reads DltR -----WBytes DltWBts --Writs DltW -----CBytes DltCBts ---Cntls DltC --PFlts DltF 4256 WINZIP32 1396315 1396168 60 55 216465 216463 20 19 22268 9218 796 417 1506 196 1864 WinMgmt 24662620 16938 5067 94 5828011 49456 20942 92 448413706 0 7765123 14 1367530 262 304 SERVICES 377067959 888 6365136 16 618119520 34824 5756100 26 49807164 8172 5599355 85 82296 0 316 LSASS 6993022 6296 102827 42 10821296 3952 84910 31 7439733 88 185814 61 48880 0 552 svchost 563629 1157 1003 4 266705 1286 607 2 140400 0 3373 6 3142 1 ^C |
You might think that if a process is doing a lot of I/O, it must be burning a lot of CPU. That's not always true or obvious. For instance, when the Oracle database is running on my laptop but with all database sessions idle, I notice hard disk activity about once every three seconds. Task Manager doesn't show oracle.exe as a top CPU process. My topio does. A trivial example can also be set up where a process is doing nothing but busy loop on null operation, while another process genuinely reads a big file, and the first process is higher on CPU usage. These are the cases where topio can be of some use. It actually can sort on any I/O counters, including page faults, which hopefully can be used to deduce real disk I/O instead of I/O against system cache. (Task Manager has PF Delta column, equivalent to my DltF, but I include it here for your convenience.) One caveat, though, is that all these I/O counters lump disk, terminal and network I/O's together. There's no way to separate them out. You have to use other information to know which of the three types it really is. But generally, a Windows service process such as oracle.exe has no terminal I/O so you can eliminate that.
Download pio.vbs and topio.pl to the same folder. (Rename the files to pio.vbs and topio.pl after you download.) If you wish to run topio.pl from folders other than where they are, change $PIO in topio.pl to the absolute path. Unless you already have Perl installed (such as the one that comes with Oracle client), download and install ActivePerl. Then first, change to the folder and type perl topio.pl -h to verify Help works. Type perl topio.pl to run the program with all default values. (If you have associated .pl with perl.exe, you can try just topio.pl. But I find that it may mess up command line options. Always prepending perl solves the problem.) Please read help first, or find the help in the Usage part of topio.pl source code. Make sure your console window is 132 characters wide to avoid line wrapping.
How does it work? topio is a Perl script that sorts on values supplied by pio.vbs, a VBScript that fetches I/O-related statistics for all processes running on the system. The statistics are collected by WMI (Windows Management Instrumentation) so make sure that service is not stopped on the target machine. pio.vbs uses Microsoft WMI class Win32_PerfRawData_PerfProc_Process to obtain this data.
The functionality of topio may eventually be merged into pstats, another freeware tool for Windows. The reason I'm developing topio separately is that Microsoft WMI class Win32_PerfRawData_PerfProc_Process is totally flawed: all those "... per second" counters are not per-second values at all; instead they're cumulative since process startup (see my message posted to the Microsoft official newsgroup without an answer.) Until that problem is addressed by Microsoft, we have to sort on delta values for I/O as well as CPU usage counters. Only memory counters can stay as absolute values, because a question "What process is using the most memory now?" is more practical than "What process has gained the most memory in the past few seconds?" If you do need an answer to the second question, pmon has a "Mem Diff" column in its output.
Historically, Linux didn't have process I/O usage recorded either in the proc filesystem or by the getrusage call (getrusage has the fields for the current process but they're never populated). On those older Linux boxes, if we want process I/O count, we can have a kernel module to catch read(2) and write(2) syscalls or their variants; I think this is exactly what AT Consultancy's atop has been doing all these years. Alternatively, we can enable block I/O debugging as done by iodump. Also see this thread. Finally, SystemTap's uid-iotop could be used as well. However, in my opinion, all those have been outdated by kernel 2.6.18-164's introduction of /proc/pid/io.
Assume you have /proc/pid/io:
Download pio and topio into the same directory, type chmod to make them executable. If you wish to run topio from a directory other than where they are now, change $PIO in topio to the absolute path. Run ./topio -h first. A screen shot of actual use is shown below. (You may have to run as root now because from some version of Linux on, /proc/*/io is no longer other-readable.)
$ ./topio -s3 -krb -n4 #display top 4 delta-read-bytes (DltRBts) processes every 3 seconds --PID ProcName----- -----RdChars DltRChs ------WtChrs DltWChs -------Rds -DltRds ------Wts DltW ------RdBytes DltRBts -----WtByts DtWB ---CWBts DltCWB 22537 ora_lmon_orac 732113332 560 206773 0 8914128 7 7935 0 144191709184 114688 413696 0 0 0 22555 ora_ckpt_orac 1317082923 1482 95358 0 8779307 10 123 0 36745311232 32768 32768 0 0 0 11629 /u01/crs/orac 16103068023 17851 8661523718 8092 50064836 59 30159419 34 14550792704 10752 7267489280 4608 52371046 8192 12599 /u01/crs/orac 79179130109 743582 57139477524 561498 56850567 404 16021147 154 3476266496 4096 10401308672 4915 11809955 8192 |
See Linux documentation for detailed description of these counters I use, exposed in /proc/pid/io. Guillaume Chazarain already implemented iotop based on these IO counters. My age-old topio serves as an alternative tool.
The only other major OS I care about is AIX. According to this discussion. AIX has nmon that can sort on process I/O. Indeed it works as expected (press t then 5 to sort on I/O). They also have nmon for Linux. But on Linux, it actually sorts on delta page faults, not really I/O's.
To my Computer Page