Jan 4, 2007

Basic #13, Subroutines

subroutines in Perl is declared like this:

sub testSub{
print "Hello World\n";
}

A call to the subroutine is made with an & before the subroutine name, as in:

&testSub;
&testSub($_);
&testSub(5, $_);

Parameters passed to the subroutine can be referenced with @_, which have nothing to do with $_

e.g.

sub printArguments{
print "@_";
}

each individual parameter can also be referenced with $_[0], $_[1], etc
** again $_[0], $_[1] has nothing to do with $_

Jan 3, 2007

Intermediate #2, Regular Expressions Part 1

regular expressions are used with the =~

e.g

$string = "the cat is looking for food";
print "yes" if($string =~ m/cat/);

m// used for matching, return a boolean
s/// used for substituting

e.g.

$string = "this is a cat";
$string =~ s/cat/dog/;

replaces the cat to dog

* the separator / can be changed to any other characters e.g s#cat#dog#

-------------------------------------------------------------------------

Metacharacter Description

\ escape character

^ match beginning of string (or line if /m modifier)

$ match end of string (or line if /m modifier)

. match any character except \n

| specify alternate matches in []

() groups expression together, each group become $1, $2, $3, etc

[] looks for a set of characters

---------------------------------------------------------------------------------

Sequence Purpose

\w alphanumeric characters including _
\W non-alphanumeric
\s white space character
\S non white space character
\d digit
\D non-digit
\b word boundary
\B non word boundary
\A Matches only the beginning of a string
\Z Matches only at end of string
\G matches where previous m//g operation left off
\t tab
\n newline
\r carriage return
\f form feed
\a alarm (bell)
\e escape
\b backspace
\033 octal character
\x1B hex character
\c[ control character
\l makes next character lowercase
\u makes next character uppercase
\L specify lowercase until \E
\U specify uppercase until \E
\E Ends case modification
\Q Quotes (disables) regexp metacharacters till \E

----------------------------------------------------------------------------

Maximal Minimal Purpose

* *? Matches 0 or more items
+ +? matches 1 or more items
? ?? matches 0 or 1 item
{n} {n}? matches exactly n items
{n,} {n,}? matches at least n items
{n,m} {n,m}? matches at least n, but not more than m items

Maximal : matches maximum no of times
Minimal : matches min no of times

----------------------------------------------------------------------------------

Modifier Description

g matches all occurrence within a string, not just the first
i case insensitive
m for multi-line strings, ^ and $ match end of string instead of individual lines
o eval expression only once
s allow us of . to match newline character
x allows using of whitespace in expression for clarity
e eval replacement string as an expression(substitution only)

*modifiers are put at the end of m// or s/// as e.g. m//g


-------------------------------------------------------------------------------------

Intermediate #1, File Handling

#!c:/www/perl/bin/perl

$file = "D:/My Perls/data.txt";
open(INFILE, $file);
@lines = <INFILE>;
close(INFILE);

foreach(@lines){
print $_;
}


INFILE is the file handler, no quotes is to be added. It can be replaced with any arbitual names.

<INFILE>reads in the entire file in one go, if a scalar variable is used instead of the array, then only the next line would be read in.

-----------------------------------------------------------------
open(INFO, $file); # Open for input
open(INFO, ">$file"); # Open for output
open(INFO, ">>$file"); # Open for appending
open(INFO, "<$file"); # Also open for input

*can add the - or die("error :".$!) for error handling
-----------------------------------------------------------------

if the file is already opened for output,
we can print to the file with the print statement with an additional parameter :

print INFO "END OF FILE\n";

-----------------------------------------------------------------

Standard Input and Output:
open(INFO, '-'); # Open standard input
open(INFO, '>-'); # Open standard output

Basic #12, loop controls

next - skips remainder of the code block, forcing the loop to proceed to the next value in the loop.

last - end the loop entirely, skipping the continue block also

redo - reexecute the code block wthout reevaluating the conditional statement for the loop. It skips the remainder of the code block and also the continue block.

Basic #11, for loops

LABEL for (EXPR; EXPR; EXPR) BLOCK

e.g.

for($i=0;$i<100;$i++){
.....
}

for($i=0, $j=0; $i<100;$i++,$j++){
....
}

LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK

e.g.

for (@months){
print "$_\n";
}

foreach $key (keys %monthstonum){
print "Month $monthstonum{$key} is $key\n";
}

Basic #10, while statements

while (EXPR)
while (EXPR) BLOCK
while (EXPR) BLOCK continue BLOCK

the block after the optional continue statement will be executed after each iteration

do BLOCK while (EXPR)
do BLOCK until (EXPR)


Basic #9, If statements

Various ways of executing the if statement

if(expr)
if(expr) {block}
if(expr) {block} else {block}
if(expr) {block} elsif (expr) {block} ...
if(expr) {block} elsif (expr) {block} ... else {block}

(expression) ? (statement if true) : (statement if false)

unless (expr)

MISC #3, Reserved Hashes

%INC
contains the list of files included via do or require. The key is the file u specified, value is the actual location of the imported file

%ENV
list of operators supplied by the current environment. The key is the name of the env variable, corresponding value is the variable's value. Setting a single variable in the hash changes the environment variable for child processes.

%SIG
keys of the %SIG are the signals available on the current machine. The values corresponds to how the signal will be handled.

MISC #2, Reserved Arrays

@ARGV
list of command line arguments supplied to the script. first index, 0, is the first argument

@INC
list of directories that Perl should examine when importing modules via the do, require, or use constructs

@_
list of parameters supplied to the function or subroutine

MISC #1, Reserved Variables

$_, $ARG
default input and pattern searching space

$
contains the reg expression specified within the parentheses from the last reg expression match

$&, $MATCH
String matched by last successful pattern match

$', $PREMATCH
string preceeding information matched by the last pattern match

$', $POSTMATCH
string following information matched by the last pattern match

$+, $LAST_PAREN_MATCH
last bracket match by the last regular expression search pattern

$*, $MULTILINE_MATCHING
Set to 1 to do multiline pattern matching within a string. default 0

$., $NR, $INPUT_LINE_NUMBER, input_line_number HANDLE EXPR
current input line no. of the last file read, can be either keyboard, ext file or other file handle

$/, $RS, $INPUT_RECORD_SEPARATOR, input_record_separator HANDLE EXPR
current input record separator. newline by default, can be undefined to read in entire file

$|, $OUTPUT_AUTOFLUSH, autoflush HANDLE EXPR
all output is buffered by default, periodically flushed. Value is thus set to 0. If value is non-zero, file handle will be auto-flushed after each write operation

$, $OFS, $OUTPUT_FIELD_SEPARATOR, output_field_separator HANDLE EXPR
default output separator for the print series of functions. default is comma separated

$\, $ORS, $OUTPUT_RECORD_SEPARATOR, output_record_separator HANDLE EXPR
default output record separator. Ordinarily, none

$", $LIST_SEPARATOR
defines the separator inserted between elements of array output within a double-quoted string. Default is single space

$;, $SUBSEP, $SUBSCRIPT_SEPARATOR
separator used when emulating multi-dimensional arrays. default "\034"

$#, $OFMT
default number format to use when printing numbers

$%, $FORMAT_PAGE_NUMBER
page number of current output channel

$=, $FORMAT_LINES_PER_PAGE
no. of printable lines of the current page

$-, $FORMAT_LINES_LEFT
no. of lines available to print on the current page

$~, $FORMAT_NAME
name of the current report format in used by the current output channel. default to name of file handle

$^, $FORMAT_TOP_NAME
name of current top-of-page output format for the current output channel. default is the filehandle with _TOP appended

$:, $FORMAT_LINE_BREAK_CHARACTERS
set of characters after which aa string may be broken to fill continuation fields. Default is "\n-"

$^L, $FORMAT_FORMFEED
character to be used to send a form feed to the output channel. default '\b'

$ARGV
name of current file when reading from the default filehandle <>

$^A, $ACCUMULATOR
when outputting formatted information via the reporting system, the formline functions put the formatted results into $^A, and the write function then outputs and empties the accumulator variable. This the current value of the write accumulator for format lines.

$?, $CHILD_ERROR
status returned by the last external command or last pipe close. This is the value returned by wait, so the true return value is $? >> 8, and $? &127 is the number of the signal received by the process, if appropriate

$!, $ERRNO, $OS_ERROR
returns the error number or error string according to the context in which it is used

$^E, $EXTENDED_OS_ERROR
contains extended error info for OSs other than Unix. In unix, $^E = $!

$@, $EVAL_ERROR
error msg returned by the Perl interpreter after executing the eval function. if null, last eval call is successful.

$$, $PID, $PROCESS_ID
process no. of Perl Interpreter executing the current script

$<, $UID, $REAL_USER_ID
real ID of the user currently executing the interpreter that is executing the script

$>, $EUID, $EFFECTIVE_USER_ID
effective user id of the current process

$(, $GID, $REAL_GROUP_ID
real group id of current process. If OS supports multi-simultaneous group membership, this returns a space-separated list of group ids

$), $EGID, $EFFECTIVE_GROUP_ID
effective group id of the current process

$0, $PROGRAM_NAME
name of the current file containing the script currently being executed

$[
index of 1st element in array or first character in a substring

$], $PERL_VERSION
version + patchlevel\1000 of the Perl interpreter

$^D, $DEBUGGING
value of current debugging flag

$^F, $SYSTEM_FD_MAX
max system file descriptor no. - usually 2. System file descriptors are duplicated across exec'd processes, although higher descriptors are not

$^H
status of syntax checks enabled by compiler hints, such as use strict

$^I, $INPLACE_EDIT
value of inplace-edit extension (enabled via the -i switch on the cmd line)

$^M
size of emergency pool reserved for use by Perl and the die function when Perl runs out of memory. The only standard method available for trapping Perl memory overuse during execution

$^O, $OSNAME
operating system name

$^P, $PERLDB
interval variable that enables you to specify the debugging value.

$^R
value of last evaluation in a (?{ code }) block within a reg expression

$^S
current interpreter state. Value is undefined if parsing of the current module is not finished. It is true if inside and evaluation block, otherwise false.

$^T, $BASETIME
time at which the script starts running, defined as no of seconds since Epoch

$^W, $WARNING
current value of the warning switch (specified via the -w cmd line option)

$^X, $EXECUTABLE_NAME
name of the Perl binary being executed, determined via the value of C's argv[0].