From ea318d1431c89e647598c510c4245c6571aa5f46 Mon Sep 17 00:00:00 2001 From: Timothy Pearson Date: Thu, 26 Jan 2012 23:32:43 -0600 Subject: Update to latest tqt3 automated conversion --- doc/html/qregexp.html | 1037 ------------------------------------------------- 1 file changed, 1037 deletions(-) delete mode 100644 doc/html/qregexp.html (limited to 'doc/html/qregexp.html') diff --git a/doc/html/qregexp.html b/doc/html/qregexp.html deleted file mode 100644 index 089a42f83..000000000 --- a/doc/html/qregexp.html +++ /dev/null @@ -1,1037 +0,0 @@ - - - - - -TQRegExp Class - - - - - - - -

TQRegExp Class Reference

- -

The TQRegExp class provides pattern matching using regular expressions. -More... -

All the functions in this class are reentrant when TQt is built with thread support.

#include <qregexp.h> -

List of all member functions. -

Public Members

enum CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch }
TQRegExp ()
TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )
TQRegExp ( const TQRegExp & rx )
~TQRegExp ()
TQRegExp & operator= ( const TQRegExp & rx )
bool operator== ( const TQRegExp & rx ) const
bool operator!= ( const TQRegExp & rx ) const
bool isEmpty () const
bool isValid () const
TQString pattern () const
void setPattern ( const TQString & pattern )
bool caseSensitive () const
void setCaseSensitive ( bool sensitive )
bool wildcard () const
void setWildcard ( bool wildcard )
bool minimal () const
void setMinimal ( bool minimal )
bool exactMatch ( const TQString & str ) const
int match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const (obsolete)
int search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const
int searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const
int matchedLength () const
int numCaptures () const
TQStringList capturedTexts ()
TQString cap ( int nth = 0 )
int pos ( int nth = 0 )
TQString errorString ()

Static Public Members

TQString escape ( const TQString & str )

Detailed Description

- - - -The TQRegExp class provides pattern matching using regular expressions. -

- - - - -

Regular expressions, or "regexps", provide a way to find patterns -within text. This is useful in many contexts, for example: -

Validation -	A regexp can be used to check whether a piece of text -meets some criteria, e.g. is an integer or contains no -whitespace. -
Searching -	Regexps provide a much more powerful means of searching -text than simple string matching does. For example we can -create a regexp which says "find one of the words 'mail', -'letter' or 'correspondence' but not any of the words -'email', 'mailman' 'mailer', 'letterbox' etc." -
Search and Replace -	A regexp can be used to replace a pattern with a piece of -text, for example replace all occurrences of '&' with -'&' except where the '&' is already followed by 'amp;'. -
String Splitting -	A regexp can be used to identify where a string should be -split into its component fields, e.g. splitting tab-delimited -strings. -

We present a very brief introduction to regexps, a description of -TQt's regexp language, some code examples, and finally the function -documentation itself. TQRegExp is modeled on Perl's regexp -language, and also fully supports Unicode. TQRegExp can also be -used in the weaker 'wildcard' (globbing) mode which works in a -similar way to command shells. A good text on regexps is Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools by Jeffrey E. Friedl, ISBN 1565922573. -

Experienced regexp users may prefer to skip the introduction and -go directly to the relevant information. -

In case of multi-threaded programming, note that TQRegExp depends on -TQThreadStorage internally. For that reason, TQRegExp should only be -used with threads started with TQThread, i.e. not with threads -started with platform-specific APIs. -

Introduction - -
Characters and Abbreviations for Sets of Characters - -
Sets of Characters - -
Quantifiers - -
Capturing Text - -
Assertions - -
Wildcard Matching (globbing) - -
Notes for Perl Users - -
Code Examples - -

- - -

Introduction -

Regexps are built up from expressions, quantifiers, and assertions. -The simplest form of expression is simply a character, e.g. -x or 5. An expression can also be a set of -characters. For example, [ABCD], will match an A or -a B or a C or a D. As a shorthand we could -write this as [A-D]. If we want to match any of the -captital letters in the English alphabet we can write -[A-Z]. A quantifier tells the regexp engine how many -occurrences of the expression we want, e.g. x{1,1} means -match an x which occurs at least once and at most once. -We'll look at assertions and more complex expressions later. -

Note that in general regexps cannot be used to check for balanced -brackets or tags. For example if you want to match an opening html - and its closing  you can only use a regexp if you -know that these tags are not nested; the html fragment, bold bolder will not match as expected. If you know the -maximum level of nesting it is possible to create a regexp that -will match correctly, but for an unknown level of nesting, regexps -will fail. -

We'll start by writing a regexp to match integers in the range 0 -to 99. We will require at least one digit so we will start with -[0-9]{1,1} which means match a digit exactly once. This -regexp alone will match integers in the range 0 to 9. To match one -or two digits we can increase the maximum number of occurrences so -the regexp becomes [0-9]{1,2} meaning match a digit at -least once and at most twice. However, this regexp as it stands -will not match correctly. This regexp will match one or two digits -within a string. To ensure that we match against the whole -string we must use the anchor assertions. We need ^ (caret) -which when it is the first character in the regexp means that the -regexp must match from the beginning of the string. And we also -need $ (dollar) which when it is the last character in the -regexp means that the regexp must match until the end of the -string. So now our regexp is ^[0-9]{1,2}$. Note that -assertions, such as ^ and $, do not match any -characters. -

If you've seen regexps elsewhere they may have looked different from -the ones above. This is because some sets of characters and some -quantifiers are so common that they have special symbols to -represent them. [0-9] can be replaced with the symbol -\d. The quantifier to match exactly one occurrence, -{1,1}, can be replaced with the expression itself. This means -that x{1,1} is exactly the same as x alone. So our 0 -to 99 matcher could be written ^\d{1,2}$. Another way of -writing it would be ^\d\d{0,1}$, i.e. from the start of the -string match a digit followed by zero or one digits. In practice -most people would write it ^\d\d?$. The ? is a -shorthand for the quantifier {0,1}, i.e. a minimum of no -occurrences a maximum of one occurrence. This is used to make an -expression optional. The regexp ^\d\d?$ means "from the -beginning of the string match one digit followed by zero or one -digits and then the end of the string". -

Our second example is matching the words 'mail', 'letter' or -'correspondence' but without matching 'email', 'mailman', -'mailer', 'letterbox' etc. We'll start by just matching 'mail'. In -full the regexp is, m{1,1}a{1,1}i{1,1}l{1,1}, but since -each expression itself is automatically quantified by {1,1} -we can simply write this as mail; an 'm' followed by an 'a' -followed by an 'i' followed by an 'l'. The symbol '|' (bar) is -used for alternation, so our regexp now becomes -mail|letter|correspondence which means match 'mail' or -'letter' or 'correspondence'. Whilst this regexp will find the -words we want it will also find words we don't want such as -'email'. We will start by putting our regexp in parentheses, -(mail|letter|correspondence). Parentheses have two effects, -firstly they group expressions together and secondly they identify -parts of the regexp that we wish to capture. Our regexp still matches any of the three words but now -they are grouped together as a unit. This is useful for building -up more complex regexps. It is also useful because it allows us to -examine which of the words actually matched. We need to use -another assertion, this time \b "word boundary": -\b(mail|letter|correspondence)\b. This regexp means "match -a word boundary followed by the expression in parentheses followed -by another word boundary". The \b assertion matches at a position in the regexp not a character in the regexp. A word -boundary is any non-word character such as a space a newline or -the beginning or end of the string. -

For our third example we want to replace ampersands with the HTML -entity '&'. The regexp to match is simple: &, i.e. -match one ampersand. Unfortunately this will mess up our text if -some of the ampersands have already been turned into HTML -entities. So what we really want to say is replace an ampersand -providing it is not followed by 'amp;'. For this we need the -negative lookahead assertion and our regexp becomes: -&(?!amp;). The negative lookahead assertion is introduced -with '(?!' and finishes at the ')'. It means that the text it -contains, 'amp;' in our example, must not follow the expression -that preceeds it. -

Regexps provide a rich language that can be used in a variety of -ways. For example suppose we want to count all the occurrences of -'Eric' and 'Eirik' in a string. Two valid regexps to match these -are \b(Eric|Eirik)\b and \bEi?ri[ck]\b. We need -the word boundary '\b' so we don't get 'Ericsson' etc. The second -regexp actually matches more than we want, 'Eric', 'Erik', 'Eiric' -and 'Eirik'. -

We will implement some the examples above in the -code examples section. -

Characters and Abbreviations for Sets of Characters -

Element	Meaning -
c -	Any character represents itself unless it has a special -regexp meaning. Thus c matches the character c. -
\c -	A character that follows a backslash matches the character -itself except where mentioned below. For example if you -wished to match a literal caret at the beginning of a string -you would write \^. -
\a -	This matches the ASCII bell character (BEL, 0x07). -
\f -	This matches the ASCII form feed character (FF, 0x0C). -
\n -	This matches the ASCII line feed character (LF, 0x0A, Unix newline). -
\r -	This matches the ASCII carriage return character (CR, 0x0D). -
\t -	This matches the ASCII horizontal tab character (HT, 0x09). -
\v -	This matches the ASCII vertical tab character (VT, 0x0B). -
\xhhhh -	This matches the Unicode character corresponding to the -hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo -(i.e., \zero ooo) matches the ASCII/Latin-1 character -corresponding to the octal number ooo (between 0 and 0377). -
. (dot) -	This matches any character (including newline). -
\d -	This matches a digit (TQChar::isDigit()). -
\D -	This matches a non-digit. -
\s -	This matches a whitespace (TQChar::isSpace()). -
\S -	This matches a non-whitespace. -
\w -	This matches a word character (TQChar::isLetterOrNumber() or '_'). -
\W -	This matches a non-word character. -
\n -	The n-th backreference, -e.g. \1, \2, etc. -

Note that the C++ compiler transforms backslashes in strings so to include a \ in a regexp you will need to enter it twice, i.e. \\. -

Sets of Characters -

Square brackets are used to match any character in the set of -characters contained within the square brackets. All the character -set abbreviations described above can be used within square -brackets. Apart from the character set abbreviations and the -following two exceptions no characters have special meanings in -square brackets. -

^ -	The caret negates the character set if it occurs as the -first character, i.e. immediately after the opening square -bracket. For example, [abc] matches 'a' or 'b' or 'c', -but [^abc] matches anything except 'a' or 'b' or -'c'. -
- -	The dash is used to indicate a range of characters, for -example [W-Z] matches 'W' or 'X' or 'Y' or 'Z'. -

Using the predefined character set abbreviations is more portable -than using character ranges across platforms and languages. For -example, [0-9] matches a digit in Western alphabets but -\d matches a digit in any alphabet. -

Note that in most regexp literature sets of characters are called -"character classes". -

Quantifiers -

By default an expression is automatically quantified by -{1,1}, i.e. it should occur exactly once. In the following -list E stands for any expression. An expression is a -character or an abbreviation for a set of characters or a set of -characters in square brackets or any parenthesised expression. -

E? -	Matches zero or one occurrence of E. This quantifier -means "the previous expression is optional" since it will -match whether or not the expression occurs in the string. It -is the same as E{0,1}. For example dents? -will match 'dent' and 'dents'. -
E+ -	Matches one or more occurrences of E. This is the same -as E{1,MAXINT}. For example, 0+ will match -'0', '00', '000', etc. -
E* -	Matches zero or more occurrences of E. This is the same -as E{0,MAXINT}. The * quantifier is often -used by a mistake. Since it matches zero or more -occurrences it will match no occurrences at all. For example -if we want to match strings that end in whitespace and use -the regexp *\s$ we would get a match on every string. -This is because we have said find zero or more whitespace -followed by the end of string, so even strings that don't end -in whitespace will match. The regexp we want in this case is -\s+$** to match strings that have at least one -whitespace at the end. -
E{n} -	Matches exactly n occurrences of the expression. This -is the same as repeating the expression n times. For -example, x{5} is the same as xxxxx. It is also -the same as E{n,n}, e.g. x{5,5}. -
E{n,} -	Matches at least n occurrences of the expression. This -is the same as E{n,MAXINT}. -
E{,m} -	Matches at most m occurrences of the expression. This -is the same as E{0,m}. -
E{n,m} -	Matches at least n occurrences of the expression and at -most m occurrences of the expression. -

(MAXINT is implementation dependent but will not be smaller than -1024.) -

If we wish to apply a quantifier to more than just the preceding -character we can use parentheses to group characters together in -an expression. For example, tag+ matches a 't' followed by -an 'a' followed by at least one 'g', whereas (tag)+ matches -at least one occurrence of 'tag'. -

Note that quantifiers are "greedy". They will match as much text -as they can. For example, 0+ will match as many zeros as it -can from the first zero it finds, e.g. '2.0005'. -Quantifiers can be made non-greedy, see setMinimal(). -

Capturing Text -

Parentheses allow us to group elements together so that we can -quantify and capture them. For example if we have the expression -mail|letter|correspondence that matches a string we know -that one of the words matched but not which one. Using -parentheses allows us to "capture" whatever is matched within -their bounds, so if we used (mail|letter|correspondence) -and matched this regexp against the string "I sent you some email" -we can use the cap() or capturedTexts() functions to extract the -matched characters, in this case 'mail'. -

We can use captured text within the regexp itself. To refer to the -captured text we use backreferences which are indexed from 1, -the same as for cap(). For example we could search for duplicate -words in a string using \b(\w+)\W+\1\b which means match a -word boundary followed by one or more word characters followed by -one or more non-word characters followed by the same text as the -first parenthesised expression followed by a word boundary. -

If we want to use parentheses purely for grouping and not for -capturing we can use the non-capturing syntax, e.g. -(?:green|blue). Non-capturing parentheses begin '(?:' and -end ')'. In this example we match either 'green' or 'blue' but we -do not capture the match so we only know whether or not we matched -but not which color we actually found. Using non-capturing -parentheses is more efficient than using capturing parentheses -since the regexp engine has to do less book-keeping. -

Both capturing and non-capturing parentheses may be nested. -

Assertions -

Assertions make some statement about the text at the point where -they occur in the regexp but they do not match any characters. In -the following list E stands for any expression. -

^ -	The caret signifies the beginning of the string. If you -wish to match a literal `^` you must escape it by -writing \^. For example, ^#include will only -match strings which begin with the characters '#include'. -(When the caret is the first character of a character set it -has a special meaning, see Sets of - Characters.) -
$ -	The dollar signifies the end of the string. For example -*\d\s$** will match strings which end with a digit -optionally followed by whitespace. If you wish to match a -literal `$` you must escape it by writing -\$. -
\b -	A word boundary. For example the regexp -\bOK\b means match immediately after a word -boundary (e.g. start of string or whitespace) the letter 'O' -then the letter 'K' immediately before another word boundary -(e.g. end of string or whitespace). But note that the -assertion does not actually match any whitespace so if we -write (\bOK\b) and we have a match it will only -contain 'OK' even if the string is "Its OK now". -
\B -	A non-word boundary. This assertion is true wherever -\b is false. For example if we searched for -\Bon\B in "Left on" the match would fail (space -and end of string aren't non-word boundaries), but it would -match in "tonne". -
(?=E) -	Positive lookahead. This assertion is true if the -expression matches at this point in the regexp. For example, -const(?=\s+char) matches 'const' whenever it is -followed by 'char', as in 'static const char '. -(Compare with const\s+char, which matches 'static -const char '.) -
(?!E) -	Negative lookahead. This assertion is true if the -expression does not match at this point in the regexp. For -example, const(?!\s+char) matches 'const' except -when it is followed by 'char'. -

Wildcard Matching (globbing) -

Most command shells such as bash or cmd.exe support "file -globbing", the ability to identify a group of files by using -wildcards. The setWildcard() function is used to switch between -regexp and wildcard mode. Wildcard matching is much simpler than -full regexps and has only four features: -

c -	Any character represents itself apart from those mentioned -below. Thus c matches the character c. -
? -	This matches any single character. It is the same as -. in full regexps. -
* -	This matches zero or more of any characters. It is the -same as .* in full regexps. -
[...] -	Sets of characters can be represented in square brackets, -similar to full regexps. Within the character class, like -outside, backslash has no special meaning. -

For example if we are in wildcard mode and have strings which -contain filenames we could identify HTML files with *.html. -This will match zero or more characters followed by a dot followed -by 'h', 't', 'm' and 'l'. -

Notes for Perl Users -

Most of the character class abbreviations supported by Perl are -supported by TQRegExp, see characters - and abbreviations for sets of characters. -

In TQRegExp, apart from within character classes, ^ always -signifies the start of the string, so carets must always be -escaped unless used for that purpose. In Perl the meaning of caret -varies automagically depending on where it occurs so escaping it -is rarely necessary. The same applies to $ which in -TQRegExp always signifies the end of the string. -

TQRegExp's quantifiers are the same as Perl's greedy quantifiers. -Non-greedy matching cannot be applied to individual quantifiers, -but can be applied to all the quantifiers in the pattern. For -example, to match the Perl regexp ro+?m requires: -

-    TQRegExp rx( "ro+m" );
-    rx.setMinimal( TRUE );
-

- -

The equivalent of Perl's /i option is -setCaseSensitive(FALSE). -

Perl's /g option can be emulated using a loop. -

In TQRegExp . matches any character, therefore all TQRegExp -regexps have the equivalent of Perl's /s option. TQRegExp -does not have an equivalent to Perl's /m option, but this -can be emulated in various ways for example by splitting the input -into lines or by looping with a regexp that searches for newlines. -

Because TQRegExp is string oriented there are no \A, \Z or \z -assertions. The \G assertion is not supported but can be emulated -in a loop. -

Perl's $& is cap(0) or capturedTexts()[0]. There are no TQRegExp -equivalents for $`, $' or $+. Perl's capturing variables, $1, $2, -... correspond to cap(1) or capturedTexts()[1], cap(2) or -capturedTexts()[2], etc. -

To substitute a pattern use TQString::replace(). -

Perl's extended /x syntax is not supported, nor are -directives, e.g. (?i), or regexp comments, e.g. (?#comment). On -the other hand, C++'s rules for literal strings can be used to -achieve the same: -

-    TQRegExp mark( "\\b" // word boundary
-                  "[Mm]ark" // the word we want to match
-                );
-

- -

Both zero-width positive and zero-width negative lookahead -assertions (?=pattern) and (?!pattern) are supported with the same -syntax as Perl. Perl's lookbehind assertions, "independent" -subexpressions and conditional expressions are not supported. -

Non-capturing parentheses are also supported, with the same -(?:pattern) syntax. -

See TQStringList::split() and TQStringList::join() for equivalents -to Perl's split and join functions. -

Note: because C++ transforms \'s they must be written twice in -code, e.g. \b must be written \\b. -

Code Examples -

-    TQRegExp rx( "^\\d\\d?$" );  // match integers 0 to 99
-    rx.search( "123" );         // returns -1 (no match)
-    rx.search( "-6" );          // returns -1 (no match)
-    rx.search( "6" );           // returns 0 (matched as position 0)
-

- -

The third string matches '6'. This is a simple validation -regexp for integers in the range 0 to 99. -

-    TQRegExp rx( "^\\S+$" );     // match strings without whitespace
-    rx.search( "Hello world" ); // returns -1 (no match)
-    rx.search( "This_is-OK" );  // returns 0 (matched at position 0)
-

- -

The second string matches 'This_is-OK'. We've used the -character set abbreviation '\S' (non-whitespace) and the anchors -to match strings which contain no whitespace. -

In the following example we match strings containing 'mail' or -'letter' or 'correspondence' but only match whole words i.e. not -'email' -

-    TQRegExp rx( "\\b(mail|letter|correspondence)\\b" );
-    rx.search( "I sent you an email" );     // returns -1 (no match)
-    rx.search( "Please write the letter" ); // returns 17
-

- -

The second string matches "Please write the letter". The -word 'letter' is also captured (because of the parentheses). We -can see what text we've captured like this: -

-    TQString captured = rx.cap( 1 ); // captured == "letter"
-

- -

This will capture the text from the first set of capturing -parentheses (counting capturing left parentheses from left to -right). The parentheses are counted from 1 since cap( 0 ) is the -whole matched regexp (equivalent to '&' in most regexp engines). -

-    TQRegExp rx( "&(?!amp;)" );      // match ampersands but not &amp;
-    TQString line1 = "This & that";
-    line1.replace( rx, "&amp;" );
-    // line1 == "This &amp; that"
-    TQString line2 = "His &amp; hers & theirs";
-    line2.replace( rx, "&amp;" );
-    // line2 == "His &amp; hers &amp; theirs"
-

- -

Here we've passed the TQRegExp to TQString's replace() function to -replace the matched text with new text. -

-    TQString str = "One Eric another Eirik, and an Ericsson."
-                    " How many Eiriks, Eric?";
-    TQRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
-    int pos = 0;    // where we are in the string
-    int count = 0;  // how many Eric and Eirik's we've counted
-    while ( pos >= 0 ) {
-        pos = rx.search( str, pos );
-        if ( pos >= 0 ) {
-            pos++;      // move along in str
-            count++;    // count our Eric or Eirik
-        }
-    }
-

- -

We've used the search() function to repeatedly match the regexp in -the string. Note that instead of moving forward by one character -at a time pos++ we could have written pos += rx.matchedLength() to skip over the already matched string. The -count will equal 3, matching 'One Eric another -Eirik, and an Ericsson. How many Eiriks, Eric?'; it -doesn't match 'Ericsson' or 'Eiriks' because they are not bounded -by non-word boundaries. -

One common use of regexps is to split lines of delimited data into -their component fields. -

-    str = "Trolltech AS\twww.trolltech.com\tNorway";
-    TQString company, web, country;
-    rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
-    if ( rx.search( str ) != -1 ) {
-        company = rx.cap( 1 );
-        web = rx.cap( 2 );
-        country = rx.cap( 3 );
-    }
-

- -

In this example our input lines have the format company name, web -address and country. Unfortunately the regexp is rather long and -not very versatile -- the code will break if we add any more -fields. A simpler and better solution is to look for the -separator, '\t' in this case, and take the surrounding text. The -TQStringList split() function can take a separator string or regexp -as an argument and split a string accordingly. -

-    TQStringList field = TQStringList::split( "\t", str );
-

- -

Here field[0] is the company, field[1] the web address and so on. -

To imitate the matching of a shell we can use wildcard mode. -

-    TQRegExp rx( "*.html" );         // invalid regexp: * doesn't quantify anything
-    rx.setWildcard( TRUE );         // now it's a valid wildcard regexp
-    rx.exactMatch( "index.html" );  // returns TRUE
-    rx.exactMatch( "default.htm" ); // returns FALSE
-    rx.exactMatch( "readme.txt" );  // returns FALSE
-

- -

Wildcard matching can be convenient because of its simplicity, but -any wildcard regexp can be defined using full regexps, e.g. -.*\.html$. Notice that we can't match both .html and .htm files with a wildcard unless we use *.htm* which will -also match 'test.html.bak'. A full regexp gives us the precision -we need, .*\.html?$. -

TQRegExp can match case insensitively using setCaseSensitive(), and -can use non-greedy matching, see setMinimal(). By default TQRegExp -uses full regexps but this can be changed with setWildcard(). -Searching can be forward with search() or backward with -searchRev(). Captured text can be accessed using capturedTexts() -which returns a string list of all captured strings, or using -cap() which returns the captured string for the given index. The -pos() function takes a match index and returns the position in the -string where the match was made (or -1 if there was no match). -

- -

Member Type Documentation

TQRegExp::CaretMode

- -

The CaretMode enum defines the different meanings of the caret -(^) in a regular expression. The possible values are: -

TQRegExp::CaretAtZero - -The caret corresponds to index 0 in the searched string. -
TQRegExp::CaretAtOffset - -The caret corresponds to the start offset of the search. -
TQRegExp::CaretWontMatch - -The caret never matches. -

Member Function Documentation

TQRegExp::TQRegExp () -

-Constructs an empty regexp. -

See also isValid() and errorString(). - -

TQRegExp::TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE ) -

-Constructs a regular expression object for the given pattern -string. The pattern must be given using wildcard notation if wildcard is TRUE (default is FALSE). The pattern is case -sensitive, unless caseSensitive is FALSE. Matching is greedy -(maximal), but can be changed by calling setMinimal(). -

See also setPattern(), setCaseSensitive(), setWildcard(), and setMinimal(). - -

TQRegExp::TQRegExp ( const TQRegExp & rx ) -

-Constructs a regular expression as a copy of rx. -

TQRegExp::~TQRegExp () -

-Destroys the regular expression and cleans up its internal data. - -

TQString TQRegExp::cap ( int nth = 0 ) -

-Returns the text captured by the nth subexpression. The entire -match has index 0 and the parenthesized subexpressions have -indices starting from 1 (excluding non-capturing parentheses). -

-    TQRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
-    int pos = rxlen.search( "Length: 189cm" );
-    if ( pos > -1 ) {
-        TQString value = rxlen.cap( 1 ); // "189"
-        TQString unit = rxlen.cap( 2 );  // "cm"
-        // ...
-    }
-

- -

The order of elements matched by cap() is as follows. The first -element, cap(0), is the entire matching string. Each subsequent -element corresponds to the next capturing open left parentheses. -Thus cap(1) is the text of the first capturing parentheses, cap(2) -is the text of the second, and so on. -

-Some patterns may lead to a number of matches which cannot be -determined in advance, for example: -

-    TQRegExp rx( "(\\d+)" );
-    str = "Offsets: 12 14 99 231 7";
-    TQStringList list;
-    pos = 0;
-    while ( pos >= 0 ) {
-        pos = rx.search( str, pos );
-        if ( pos > -1 ) {
-            list += rx.cap( 1 );
-            pos  += rx.matchedLength();
-        }
-    }
-    // list contains "12", "14", "99", "231", "7"
-

- -

See also capturedTexts(), pos(), exactMatch(), search(), and searchRev(). - -

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. -

TQStringList TQRegExp::capturedTexts () -

-Returns a list of the captured text strings. -

The first string in the list is the entire matched string. Each -subsequent list element contains a string that matched a -(capturing) subexpression of the regexp. -

For example: -

-        TQRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
-        int pos = rx.search( "Length: 36 inches" );
-        TQStringList list = rx.capturedTexts();
-        // list is now ( "36 inches", "36", " ", "inches", "es" )
-

- -

The above example also captures elements that may be present but -which we have no interest in. This problem can be solved by using -non-capturing parentheses: -

-        TQRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
-        int pos = rx.search( "Length: 36 inches" );
-        TQStringList list = rx.capturedTexts();
-        // list is now ( "36 inches", "36", "inches" )
-

- -

Note that if you want to iterate over the list, you should iterate -over a copy, e.g. -

-        TQStringList list = rx.capturedTexts();
-        TQStringList::Iterator it = list.begin();
-        while( it != list.end() ) {
-            myProcessing( *it );
-            ++it;
-        }
-

- -

Some regexps can match an indeterminate number of times. For -example if the input string is "Offsets: 12 14 99 231 7" and the -regexp, rx, is (\d+)+, we would hope to get a list of -all the numbers matched. However, after calling -rx.search(str), capturedTexts() will return the list ( "12", -"12" ), i.e. the entire match was "12" and the first subexpression -matched was "12". The correct approach is to use cap() in a loop. -

The order of elements in the string list is as follows. The first -element is the entire matching string. Each subsequent element -corresponds to the next capturing open left parentheses. Thus -capturedTexts()[1] is the text of the first capturing parentheses, -capturedTexts()[2] is the text of the second and so on -(corresponding to $1, $2, etc., in some other regexp languages). -

See also cap(), pos(), exactMatch(), search(), and searchRev(). - -

bool TQRegExp::caseSensitive () const -

-Returns TRUE if case sensitivity is enabled; otherwise returns -FALSE. The default is TRUE. -

TQString TQRegExp::errorString () -

-Returns a text string that explains why a regexp pattern is -invalid the case being; otherwise returns "no error occurred". -

TQString TQRegExp::escape ( const TQString & str ) `[static]` -

-Returns the string str with every regexp special character -escaped with a backslash. The special characters are $, (, ), *, +, -., ?, [, \, ], ^, {, | and }. -

Example: -

-     s1 = TQRegExp::escape( "bingo" );   // s1 == "bingo"
-     s2 = TQRegExp::escape( "f(x)" );    // s2 == "f\\(x\\)"
-

- -

This function is useful to construct regexp patterns dynamically: -

-    TQRegExp rx( "(" + TQRegExp::escape(name) +
-                "|" + TQRegExp::escape(alias) + ")" );
-

- - -

bool TQRegExp::exactMatch ( const TQString & str ) const -

-Returns TRUE if str is matched exactly by this regular expression; otherwise returns FALSE. You can determine how much of -the string was matched by calling matchedLength(). -

For a given regexp string, R, exactMatch("R") is the equivalent of -search("^R$") since exactMatch() effectively encloses the regexp -in the start of string and end of string anchors, except that it -sets matchedLength() differently. -

For example, if the regular expression is blue, then -exactMatch() returns TRUE only for input blue. For inputs bluebell, blutak and lightblue, exactMatch() returns FALSE -and matchedLength() will return 4, 3 and 0 respectively. -

Although const, this function sets matchedLength(), -capturedTexts() and pos(). -

See also search(), searchRev(), and TQRegExpValidator. - -

bool TQRegExp::isEmpty () const -

-Returns TRUE if the pattern string is empty; otherwise returns -FALSE. -

If you call exactMatch() with an empty pattern on an empty string -it will return TRUE; otherwise it returns FALSE since it operates -over the whole string. If you call search() with an empty pattern -on any string it will return the start offset (0 by default) -because the empty pattern matches the 'emptiness' at the start of -the string. In this case the length of the match returned by -matchedLength() will be 0. -

See TQString::isEmpty(). - -

bool TQRegExp::isValid () const -

-Returns TRUE if the regular expression is valid; otherwise returns -FALSE. An invalid regular expression never matches. -

The pattern [a-z is an example of an invalid pattern, since -it lacks a closing square bracket. -

Note that the validity of a regexp may also depend on the setting -of the wildcard flag, for example *.html is a valid -wildcard regexp but an invalid full regexp. -

int TQRegExp::match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const -

This function is obsolete. It is provided to keep old source working. We strongly advise against using it in new code. -

Attempts to match in str, starting from position index. -Returns the position of the match, or -1 if there was no match. -

The length of the match is stored in *len, unless len is a -null pointer. -

If indexIsStart is TRUE (the default), the position index in -the string will match the start of string anchor, ^, in the -regexp, if present. Otherwise, position 0 in str will match. -

Use search() and matchedLength() instead of this function. -

See also TQString::mid() and TQConstString. - -

Example: qmag/qmag.cpp. -

int TQRegExp::matchedLength () const -

-Returns the length of the last matched string, or -1 if there was -no match. -

See also exactMatch(), search(), and searchRev(). - -

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. -

bool TQRegExp::minimal () const -

-Returns TRUE if minimal (non-greedy) matching is enabled; -otherwise returns FALSE. -

int TQRegExp::numCaptures () const -

-Returns the number of captures contained in the regular expression. - -

Example: regexptester/regexptester.cpp. -

bool TQRegExp::operator!= ( const TQRegExp & rx ) const -

- -

Returns TRUE if this regular expression is not equal to rx; -otherwise returns FALSE. -

TQRegExp & TQRegExp::operator= ( const TQRegExp & rx ) -

-Copies the regular expression rx and returns a reference to the -copy. The case sensitivity, wildcard and minimal matching options -are also copied. - -

bool TQRegExp::operator== ( const TQRegExp & rx ) const -

-Returns TRUE if this regular expression is equal to rx; -otherwise returns FALSE. -

Two TQRegExp objects are equal if they have the same pattern -strings and the same settings for case sensitivity, wildcard and -minimal matching. - -

TQString TQRegExp::pattern () const -

-Returns the pattern string of the regular expression. The pattern -has either regular expression syntax or wildcard syntax, depending -on wildcard(). -

int TQRegExp::pos ( int nth = 0 ) -

-Returns the position of the nth captured text in the searched -string. If nth is 0 (the default), pos() returns the position -of the whole match. -

Example: -

-    TQRegExp rx( "/([a-z]+)/([a-z]+)" );
-    rx.search( "Output /dev/null" );    // returns 7 (position of /dev/null)
-    rx.pos( 0 );                        // returns 7 (position of /dev/null)
-    rx.pos( 1 );                        // returns 8 (position of dev)
-    rx.pos( 2 );                        // returns 12 (position of null)
-

- -

For zero-length matches, pos() always returns -1. (For example, if -cap(4) would return an empty string, pos(4) returns -1.) This is -due to an implementation tradeoff. -

See also capturedTexts(), exactMatch(), search(), and searchRev(). - -

int TQRegExp::search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const -

-Attempts to find a match in str from position offset (0 by -default). If offset is -1, the search starts at the last -character; if -2, at the next to last character; etc. -

Returns the position of the first match, or -1 if there was no -match. -

The caretMode parameter can be used to instruct whether ^ -should match at index 0 or at offset. -

You might prefer to use TQString::find(), TQString::contains() or -even TQStringList::grep(). To replace matches use -TQString::replace(). -

Example: -

-        TQString str = "offsets: 1.23 .50 71.00 6.00";
-        TQRegExp rx( "\\d*\\.\\d+" );    // primitive floating point matching
-        int count = 0;
-        int pos = 0;
-        while ( (pos = rx.search(str, pos)) != -1 ) {
-            count++;
-            pos += rx.matchedLength();
-        }
-        // pos will be 9, 14, 18 and finally 24; count will end up as 4
-

- -

Although const, this function sets matchedLength(), -capturedTexts() and pos(). -

See also searchRev() and exactMatch(). - -

Examples: network/archivesearch/archivedialog.ui.h and regexptester/regexptester.cpp. -

int TQRegExp::searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const -

-Attempts to find a match backwards in str from position offset. If offset is -1 (the default), the search starts at the -last character; if -2, at the next to last character; etc. -

Returns the position of the first match, or -1 if there was no -match. -

The caretMode parameter can be used to instruct whether ^ -should match at index 0 or at offset. -

Although const, this function sets matchedLength(), -capturedTexts() and pos(). -

Warning: Searching backwards is much slower than searching -forwards. -

void TQRegExp::setCaseSensitive ( bool sensitive ) -

-Sets case sensitive matching to sensitive. -

If sensitive is TRUE, \.txt$ matches readme.txt but -not README.TXT. -

void TQRegExp::setMinimal ( bool minimal ) -

-Enables or disables minimal matching. If minimal is FALSE, -matching is greedy (maximal) which is the default. -

For example, suppose we have the input string "We must be -bold, very bold!" and the pattern -.*. With the default greedy (maximal) matching, -the match is "We must be bold, very -bold!". But with minimal (non-greedy) matching the -first match is: "We must be bold, very -bold!" and the second match is "We must be bold, -very bold!". In practice we might use the pattern -[^<]+ instead, although this will still fail for -nested tags. -

void TQRegExp::setPattern ( const TQString & pattern ) -

-Sets the pattern string to pattern. The case sensitivity, -wildcard and minimal matching options are not changed. -

void TQRegExp::setWildcard ( bool wildcard ) -

-Sets the wildcard mode for the regular expression. The default is -FALSE. -

Setting wildcard to TRUE enables simple shell-like wildcard -matching. (See wildcard matching - (globbing).) -

For example, r*.txt matches the string readme.txt in -wildcard mode, but does not match readme. -

bool TQRegExp::wildcard () const -

-Returns TRUE if wildcard mode is enabled; otherwise returns FALSE. -The default is FALSE. -

TQRegExp Class Reference

Public Members

Static Public Members

Detailed Description

Introduction -

Characters and Abbreviations for Sets of Characters -

Sets of Characters -

Quantifiers -

Capturing Text -

Assertions -

Wildcard Matching (globbing) -

Notes for Perl Users -

Code Examples -

Member Type Documentation

TQRegExp::CaretMode

Member Function Documentation

TQRegExp::TQRegExp () -

TQRegExp::TQRegExp ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE ) -

TQRegExp::TQRegExp ( const TQRegExp & rx ) -

TQRegExp::~TQRegExp () -

TQString TQRegExp::cap ( int nth = 0 ) -

TQStringList TQRegExp::capturedTexts () -

bool TQRegExp::caseSensitive () const -

TQString TQRegExp::errorString () -

TQString TQRegExp::escape ( const TQString & str ) [static] -

bool TQRegExp::exactMatch ( const TQString & str ) const -

bool TQRegExp::isEmpty () const -

bool TQRegExp::isValid () const -

int TQRegExp::match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const -

int TQRegExp::matchedLength () const -

bool TQRegExp::minimal () const -

int TQRegExp::numCaptures () const -

bool TQRegExp::operator!= ( const TQRegExp & rx ) const -

TQRegExp & TQRegExp::operator= ( const TQRegExp & rx ) -

bool TQRegExp::operator== ( const TQRegExp & rx ) const -

TQString TQRegExp::pattern () const -

int TQRegExp::pos ( int nth = 0 ) -

int TQRegExp::search ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const -

int TQRegExp::searchRev ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const -

void TQRegExp::setCaseSensitive ( bool sensitive ) -

void TQRegExp::setMinimal ( bool minimal ) -

void TQRegExp::setPattern ( const TQString & pattern ) -

void TQRegExp::setWildcard ( bool wildcard ) -

bool TQRegExp::wildcard () const -

TQString TQRegExp::escape ( const TQString & str ) `[static]` -