Jan 3, 2007

Intermediate #2, Regular Expressions Part 1

regular expressions are used with the =~

e.g

$string = "the cat is looking for food";
print "yes" if($string =~ m/cat/);

m// used for matching, return a boolean
s/// used for substituting

e.g.

$string = "this is a cat";
$string =~ s/cat/dog/;

replaces the cat to dog

* the separator / can be changed to any other characters e.g s#cat#dog#

-------------------------------------------------------------------------

Metacharacter Description

\ escape character

^ match beginning of string (or line if /m modifier)

$ match end of string (or line if /m modifier)

. match any character except \n

| specify alternate matches in []

() groups expression together, each group become $1, $2, $3, etc

[] looks for a set of characters

---------------------------------------------------------------------------------

Sequence Purpose

\w alphanumeric characters including _
\W non-alphanumeric
\s white space character
\S non white space character
\d digit
\D non-digit
\b word boundary
\B non word boundary
\A Matches only the beginning of a string
\Z Matches only at end of string
\G matches where previous m//g operation left off
\t tab
\n newline
\r carriage return
\f form feed
\a alarm (bell)
\e escape
\b backspace
\033 octal character
\x1B hex character
\c[ control character
\l makes next character lowercase
\u makes next character uppercase
\L specify lowercase until \E
\U specify uppercase until \E
\E Ends case modification
\Q Quotes (disables) regexp metacharacters till \E

----------------------------------------------------------------------------

Maximal Minimal Purpose

* *? Matches 0 or more items
+ +? matches 1 or more items
? ?? matches 0 or 1 item
{n} {n}? matches exactly n items
{n,} {n,}? matches at least n items
{n,m} {n,m}? matches at least n, but not more than m items

Maximal : matches maximum no of times
Minimal : matches min no of times

----------------------------------------------------------------------------------

Modifier Description

g matches all occurrence within a string, not just the first
i case insensitive
m for multi-line strings, ^ and $ match end of string instead of individual lines
o eval expression only once
s allow us of . to match newline character
x allows using of whitespace in expression for clarity
e eval replacement string as an expression(substitution only)

*modifiers are put at the end of m// or s/// as e.g. m//g


-------------------------------------------------------------------------------------

No comments: