Oracle Regular Expressions
2009-04 Update This article was written in late 2003. In 2009, 10g is probably the most widely deployed version. See Update for 10g if you have access to a 10g database, even one on your laptop. |
Developers with UNIX background or Perl programming experience have long wished for the support of regular expressions in Oracle SQL. They either use an external Java program based on software such as OROMatcher, or similarly extproc_perlNote, or just patiently wait for Oracle 10g to be installed. But the truth is, Oracle has since long ago supported it already, by the package owa_pattern.
The example below shows a basic usage of the match function. The regular expression string .* \d\d\d[!.,] matches zero or all characters (signified by the dot and *) in a given string up to and including a space, followed by 3 digits (three \d) and any one of !, . or ,. The characters in the brackets [] are a character class, meaning any one of the enclosed characters. Now, set serveroutput on because we call dbms_output.put_line and watch
SQL> select * from regexptest; STR -------------------- some text 12! text numbers 123! qwerty098. SQL> begin 2 for i in (select str from regexptest) loop 3 if (owa_pattern.match(i.str, '.* \d\d\d[!.,]')) then 4 dbms_output.put_line(i.str); 5 end if; 6 end loop; 7 end; 8 / text numbers 123! PL/SQL procedure successfully completed. |
So only the second row is selected, since the first row only has 2 digits while the third row is missing a space. Ideally, we would like to call owa_pattern.match this way: select * from regexptest where owa_pattern.match(str, '.* \d\d\d[!,.]') or select * from regexptest where owa_pattern.match(str, '.* \d\d\d[!,.]') = true. But you'll get ORA-920 (invalid relational operator), the same error you get when you select * from dual where true. That's because the function match returns boolean and you can't use an expression that returns boolean anywhere other than in PL/SQL. One workaround is to write our own match function as a wrapper to change boolean to integer:
SQL> create function match (exp varchar2, regexp varchar2) return number as 2 ret number; 3 begin 4 if (owa_pattern.match(exp, regexp)) then 5 return 1; 6 else 7 return 0; 8 end if; 9 end; 10 / Function created. SQL> select * from regexptest where match(str, '.* \d\d\d[!,.]') = 1; STR --------------------------------------------------------------------------- text numbers 123! |
Or use the second function in owa_pattern, amatch. This function has an IN parameter that dictates where you want to start to match. For our purpose, we don't care so set it to 0. What we want from it is the type of the return value, integer.
SQL> select * from regexptest where owa_pattern.amatch(str, 0, '.* \d\d\d[!.,]') > 0; STR ------------------------------------------------------------------------------------- text numbers 123! |
What's returned is actually the position of the first character after the end of the match. Let's verify
SQL> begin 2 for i in (select str from regexptest) loop 3 dbms_output.put_line(owa_pattern.amatch(i.str, 0, '.* \d\d\d[!.,]')); 4 end loop; 5 end; 6 / 0 18 0 PL/SQL procedure successfully completed. |
The string text numbers 123! has 17 characters, so amatch returns 18. Since the other two lines don't have a match at all, the function returns 0.
The third function in owa_pattern is change. The author says "For example: change('Oracle 7.1.3', '\d\.\d\.\d', 'Version &'); yields: Oracle Version 7.1.3". This & holds the part of the string that just matched the second argument. Note that this change is in-place change of the string in the first argument; the function does not return the changed string (it returns number of matches). So you can't use it in SQL to replace the SQL command replace as in: select owa_pattern.change('Oracle 7.1.3', '\d\.\d\.\d', 'Version &') from dual. Here's the correct usage.
SQL> set define off serverout on SQL> declare s varchar2(20) := 'This is a test.'; 2 begin 3 owa_pattern.change(s, 'is', '& not'); 4 dbms_output.put_line(s); 5 end; 6 / This not is a test. PL/SQL procedure successfully completed. |
As usual, unless the & is at the end of the string or is followed by only spaces or tabs, you need to set define or scan off, otherwise you'll be prompted for variable substitution. Note that the replacement shown above occurred in the first "is". So be careful.
owa_pattern also supports multi-line search, and even back references. \1, \2 and so on can be referenced in the change function itself; with an owa_text.vc_arr type array, you can reference the matched strings later in your code.
Compared to other languages such as Perl, owa_pattern has its limitations. For instance, a Perl-like pattern (is)+ for is, isis and so on does not work; *, ? and + can only be used after a single character and the parenthese don't do grouping. Matching or substitution (amatch and change respectively) only support i and g modifiers, for case-insensitive and global, respectively. It also doesn't support non-greedy match. owa_pattern.change('I am very very cold', 'ver.*y ', '') changes the string to 'I am cold'; you can't use the syntax as in Perl owa_pattern.change('I am very very cold', 'ver.*?y ', '') to get 'I am very cold'.
Jeffrey Friedl has a table comparing regexp capabilities of some common tools in his Mastering Regular Expressions (see p.63, 1997 ed). The following table is an expansion of his work, where Y means supported and N means unsupported (reprint of most data shown below with written permission from O'Reilly)
Features | Modern grep | Modern egrep | awk | Perl | vi | owa_pattern |
*, ^, $, [...] | Y | Y | Y | Y | Y | Y |
? + | | \? \+ \| | ? + | | ? + | | ? + | | \? \+ | ? + |
grouping | \(...\) | (...) | (...) | (...) | \(...\) | N |
(non-)word boundary | only thru -w option | \< \> | N | \b, \B | \< \> | \b |
\w, \W | N | Y | N | Y | N | Y |
backreferences | Y | N | N | Y | Y | Y |
To learn more about owa_pattern, read the excellent comments and some practical examples written by the original author Matt Bookman in $ORACLE_HOME/rdbms/admin/pubpat.sql. If you really want to know the implementation details, read privpat.sql in the same directory. Interestingly, this along with other OWA packages is documented for the first time in 10g Oracle Documentation, and it is very well written.
Update for 10g
Oracle 10g provides a sophisticated regular expression engine. Please read Jonathan Gennick's article and book on the subject. Note that it's not completely hopeless if you can't upgrade your database to 10g and still need to use regular expressions. You can install a 10g database anywhere, such as your laptop, and work from there. Suppose your remote pre-10g database is named orcl and you create a link called orcl in your 10g database pointing to orcl. You can view the data with a regular expression applied to it.
SQL> select * from t@orcl; S -------------------- a b c d SQL> select regexp_replace(s, ' +', ' ') from t@orcl; REGEXP_REPLACE(S,'+','') ------------------------------------------------------ a b c d |
A more common requirement is to update the table. You can "borrow" this 10g database for this task.
SQL> begin 2 for r in (select rowid rid, regexp_replace(s, ' +', ' ') new_s from t@orcl) loop 3 update t@orcl set s = r.new_s where rowid = r.rid; 4 end loop; 5 end; 6 / PL/SQL procedure successfully completed. SQL> select * from t@orcl; S -------------------- a b c d |
Note that the above PL/SQL code will pull data from the remote database to your local database to process. So if you have a choice, find a 10g database that has a faster network connection to the remote database unless your laptop has equally fast connection.
Oracle continues to provide more capability for regular expressions in SQL and PL/SQL. For example, if you need to extract a specific one out of multiple matches of the same pattern, your laptop needs Oracle 11g instead of 10g. See 11g New Features Guide for details.
Just for a record. Neither of the following works, unfortunately. The query against the remote table cannot have for update clause so you can't use where current of in the update.
SQL> update t@orcl set s = regexp_replace(s, ' +', ' '); update t@orcl set s = regexp_replace(s, ' +', ' ') * ERROR at line 1: ORA-02070: database ORCL does not support REGEXP_REPLACE in this context SQL> declare 2 cursor c is select regexp_replace(s, ' +', ' ') new_s from t@orcl for update of s; 3 begin 4 for r in c loop 5 update t@orcl set s = r.new_s where current of c; 6 end loop; 7 end; 8 / declare * ERROR at line 1: ORA-02070: database ORCL does not support REGEXP_REPLACE in this context ORA-06512: at line 2 ORA-06512: at line 4 |
To my Computer Page