Here's the second of my two Perl tutorials (I'm working on part 3 currently). An HTML coopy is available from http://binaryuniverse.net/tutorials/perl2.html .

====================
Perl Tutorial - Part 2
By Ch4r/Niels | ch4r@st0rage.org

www.binaryuniverse.net | www.st0rage.org | www.di-security.org

| Copy Info |

This tutorial may be redistributed under the conditions that it is not modified in any way and full credit is given to the original author, Ch4r.

| Introduction |

This tutorial seeks to build upon the basic knowledge introduced in my first Perl tutorial. Before beginning this tutorial you should know what the following are and how to use them:

- the print function
- variables
- arrays
- the if and if/else control structures
- the while and for/foreach loops
- the <STDIN> file handle used for receiving input from the user

If the meaning of any of those terms or how they are used is unclear, I'd recommend you take a moment to read through my first Perl tutorial as the information presented in this article is useless without a knowledge of what was covered in my previous tutorial.

As usual, please feel free to contact me if you have any feedback related to the tutorial or you spot errors. I hope you enjoy part two of my sequence of Perl tutorials!

| Context |

I've decided to kick off my second Perl tutorial with a brief discussion of context. 'Context' may sound intimidating, but it's actually a fairly easy concept to grasp and doesn't require learning any new functions, operators, or anything else directly implemented in your code (of course, context does affect your code, or it wouldn't be covered here; it just isn't any part of the code itself). Context is simply the idea that an expression that uses scalar data will act differently than an expression that uses non-scalar data (ie, arrays; this is called list data), and one function or operator may yield different results depending on whether it is used with scalar or list data. When used with scalar data, an expression is said to be in scalar context, and it is said to be in list context when used with non-scalar data.

An example of how an operator is different in scalar context than in list context is the <STDIN> file handle. The <STDIN> file handle reads input from the user, as we've seen. So far we've only used it in scalar context -- we used it to store input entered by the user in a scalar variable last tutorial. What would it do in list context though, such as when the input is assigned to an array? The answer is that it reads input from the user, and each line (lines being separated with the enter key) comprises a separate element of an array. The user terminates all input by entering the EOF (End Of File) character which is usually Ctrl-D in *nix and Ctrl-Z on Windows.

For instance, suppose the user enters the three lines "line one", "line two" and "line three" while <STDIN> is being used to assign the input to the array @lines. The result is that $lines[0] will be "line one", $lines[1] will be "line two", and $lines[2] will be "line three".

Now suppose that leaving @lines as it is, we were to add it to the number 3 and assign the result to $result. This expression is in scalar context as it uses addition. However, @lines is an array. The resulting value stored in $result is 6. Why? Because if the addition operator is given an array, it counts the elements of the array and turns that number into a scalar value, then continues with the addition. In this case, @lines has 3 elements so the expression @lines + 3 is another way of saying 3 + 3. Thus, $result is assigned the value 6.

| Hashes |

Hashes are like arrays in that they are used for storing multiple values in one variable. They differ, however, in the fact that they don't use index numbers for identifying separate values stored within them. Rather, each value is identified with a string. The following is a diagram of how hashes work compared to arrays:

@array:
0 -> "a value"
1 -> "another value"
2 -> "this \nis \nthe \nthird \nelement"
3 -> 78,541
4 -> "final element!"

%hash:
"string" -> "this is a value of the hash\n"
"another string" -> "omg! Guess what?! Another value!"
"blah" -> 2,498
"endz0r" -> "this is the end.\n\a"

There are a couple things that should be noted from this diagram. The first is that while array names are prefixed with @, the name of a hash (also referred to as a hashed array) is prefixed with % (eg. %hash, %thing, %stuff, %etc...). The second is that while arrays are indexed with the predictable pattern of 0, 1, 2, 3, etc, hashes are indexed with values supplied by the coder and are not as predictable. This limits the use of hashed arrays in some ways, but also increases their flexibility in other areas.

One more thing to note about hashes is that while an element of an array is referred to as $array[0], an value of a hash is referred to as $hash{"key"}. Note that hashed arrays use curly braces as opposed to brackets. Also note that what is the equivalent of an index number in an array is referred to as a key when used with a hash. For instance, in $hash{"thing"}, "thing" is the key.

There are two methods we can use to assign a new key/value pair to a hash. The first is the following:
-----
%hash = ("key 1", "value 1", "key 2", "value 2", "key 3", "value 3", "etc", "etc..." );
-----
This assigns %hash the keys "key 1", "key 2", "key 3", and "etc" with their values being "value 1", "value 2", "value 3", "etc..." respectively. Thus, the first possible syntax for assigning keys and values to a hash is to list the key and value separated by a comma, then another key and value with the multiple assignments of keys and values separated by commas. The syntax used above defines the whole hash to be suede (of course it can be modified later) overwriting any pre-existing data.

However,r this syntax as it is is not easily readable by humans. To make it clearer some of the commas could be replaced with "=>". This doesn't change anything other than whether the code is easily read by humans:
-----
%hash = ("key 1" => "value 1",
"key 2" => "value 2",
"key 3" => "value 3",
"etc" => "etc..." );
-----
In this case we've also added some extra whitespace (namely, newlines) but it produces the exact same array as the previous example did.

The second way to assign a key/value pair to a hash is specifically assign one key/value pair to a hash, as in the following:
-----
$hash{"key"} = "the value!\n";
-----
%hash now contains the string "the value!\n" with its key being simply "key".

This notation is an example of how values within hashes are referred to. Assigning a specific key a value is nowhere near the only thing that can be done while referring to a specific key/value pair. The value could be printed, could help comprise a mathematical expression, could be assigned to another variable, etc -- the possibilities are the same as with scalar variables or arrays.

There are several functions which will come in handy when working when hashes. If you're wondering what a function is, it is code written by other people (well, not always by other people, but that definition does the job until this tutorial introduces coding your own subroutines) that performs a specific set of instructions on its parameters. If you don't quite get that, it'll become clear as we work with Perl's built in functions.

The first function I will illustrate in this paper is the delete function. It deletes a key/value pair from a hash. For instance, the following line deletes the key/value combination 'the_key' from the has %the_hash:
-----
delete $the_hash{the_key};
-----

Two more often used functions that are similar to each other are the keys and values functions. They return a list of the keys of a given hash and a list of values for that specific hash respectively. For instance, the following script prints each key-value pair contained in %hash:
----
#!/usr/bin/env perl

%hash = (
"key_thing" => "this seems to be a string that is a value in a hash",
"pi" => 3.14
"last" => "omg, guess what?! This is the last element!"
);

@keyz0rz = keys %hash;
@valuez0rz = values $hash;

print "Keys are:\n";
foreach (@keyz0rz) {
print;
print "\n";
}

print "\nThe values are:\n";
foreach (@valuez0rz) {
print;
print "\n";
}
----
The first line of this script, #!/usr/bin/env perl, has the equivalant effect as #!/usr/bin/perl. The difference is that no matter where the Perl interpreter is located on the user's system, it will be executed as long as it is in the user's path.

The next 6 lines assign keys and values to %hash. Proceeding the assignment, the keys and values existing within %hash are both assigned to the appropriate arrays with use of the keys and values functions. Finally, both arrays are printed to standard output within foreach loops.

Note that we take advantage of the default argument for the print function here for the purpose of shortenning our code. If the print function is called without arguments passed to it (usually the argument is a string to be printed) then it simply prints the variable $_ regardless of it's contents. In this example, that works out perfectly as $_ holds the element of the array used in the current iteration of the foreach loop.

| Subroutines |

Subroutines are an important concept to understand that you will use more and more as you write longer and more complex programs, and are an excellent method of organizing and reusing code. If you've ever worked with defining your own functions in another language, such as C, subroutines in Perl are the same concept with some syntax differences. Functions written by other coders have already been introduced and used frequently throughout this tutorial (as well as part 1). Examples of these are print, which prints text supplied by the programmer to standard output, and the delete function, which deletes an element from a hash.

Subroutines are in all ways identical to those functions with the exception that they are defined by the programmer writing the script they're used in. So what exactly are subroutines? They are simply blocks of code with labels attached to them and the code they consist of can be called by simple using the label attached to them. Although it may not seem like it at first, they are very useful.

For instance, suppose you wanted to code a simple script that prompted the user for two integers, added them, and displayed the output. It would consist of a few print()s, an addition operation, and some assignments to variables. Suppose, however, that you wanted to edit the script to repeat that procedure five times. Writing the same code over and over quickly becomes tedious and boring, and when it is a long segment of code it can be very time consuming. For this reason, Perl includes a feature that allows coders to define and customize their subroutines.

Defining subroutines is actually quite simple. The process consists of typing the 'sub' keyword, followed by the name of the subroutine, and then the body of the subroutine (the code that is executed when the subroutine is called) enclosed in braces. For instance, the following is a subroutine named hello_there that simply prints 'Hello, world!' to standard output when it is called:
-----
sub hello_there {
print "Hello, world!\n";
}
-----
Calling the subroutine for the purpose of executing its body is equally simple. Subroutines are called simply by inserting the name of the subroutine into the script it is to be used in, prefixed with an ampersand (&). As an example, the following bit of code calls the subroutine defined earlier:
-----
print "The next line will call hello_there.\n";
&hello_there;
print "Subroutine 'hello_there' was called on the previous line\n";
-----
If the previous two examples are combined into one script, the output would look like this:

The next line will call hello_there.
Hello, world!
Subroutine 'hello_there' was called on the previous line

A very useful feature of subroutines is the ability to pass arguments, or parameters, to them when they are called. Arguments are simply extra pieces of data passed to a function when it is called. Take, for example, the print function. It usually has one argument of data passed to it -- one string which is to be printed to standard output. However, more parameters can be passed to it by separating them using commas, and this results in each of the strings passed to print to be printed to standard output. Similarly, it is possible to pass arguments to user-defined subroutines using the same syntax as with print -- namely, separating each parameter from the next with commas. Parameters passed to a function may be enclosed in parenthesis to increase readability, but the parenthesis are optional in Perl, unlike in some other languages (such as C).

The following segment of code lists two different ways of calling the subroutine 'afunc' with three parameters: "hello!!", 45, and "blah".
-----
&afunc "hello!!', 45, "blah";
&afunc("hello!!", 45, "blah");
-----

By this point you are probably wondering how to access and use arguments passed to a function. The answer is very simple: each argument passed to a subroutine is placed into the array @_. So, in the previous example, @_ contains three elements: $_[0], which is the string "hello!!"; $_[1], which is the numerical value 45; and $_[2], which is the string "blah". Parameters stored as elements of @_ may be assigned to variables, passed as parameters to other subroutines, and used in the same style that regular array elements may be used. Take care, however, when modifying elements of @_: this modifies the arguments passed to the current subroutine directly and the original data will be lost.

Another important concept to take note of is that of scope. In many languages, such as C, modifications made to variables in one function do not effect that variable when used in another function or the main() section of the program. For instance, if variable 'a_var' is assigned a value of 13 in the function a_func(), a_var is still undefined when used in main() unless it is declared in main() as well well, and any modifications made to a_var in a_func will not effect the copy of a_var stored in main(). This concept is referred to as scope (if you don't know C or don't understand this example made in C, don't worry; it should become clear later on).

Perl does not have this restriction, however. If a subroutine assigns $a_var the string value "Hello, world!", $a_var could be included as a parameter to the print() function in the main part of your script and the text "Hello, world!" would be displayed. Eg:
-----
&assignvariable;
print $hellooovariable;

sub assignvariable {
$hellooovariable = "Hello, world!\n";
}
-----
This code will work fine, outputting the text "Hello, world!\n" (with \n being a newline, of course). However, for several different reasons, the programmer may desire to make the variable $hellooovariable used in the subroutine local to that subroutine only. This means that the remaining sections of the script that is not part of the subroutine can not access or modify the copy of $hellooovariable stored in that particular subroutine. It is as if the variable does not exist as far as the rest of the script is concerned. This also means that a new variable called $hellooovariable could be assigned the numeric value 27 outside of the subroutine $hellooovariable is local to, and the $hellooovariable would still hold the value "Hello, world!\n" when used in the subroutine it was originally local to, whilst it would hold the value 27 when used in the heart of the script. The concept of scope can simply be thought of as each subroutine having its own copy of a specific variable.

So how, you may be asking, can variables be made local to the function they are declared in? The answer is simply by declaring them with the 'my' keyword. Take a look at the following sample:
-----
my $teh_variable = 27;
&change_teh_variable("Let's change \$teh_variable to this string!\n");
print $teh_variable;

sub change_teh_variable {
my $teh_variable = $_[0];
}
-----

The first line of this script declares $teh_variable as a local variable using the my keyword, and assigns it the value of 27. The next line calls the subroutine change_teh_variable with one parameter: the string "Let's change \$teh_variable to this string!\n". Control is now handed over to the function change_teh_variable(). change_teh_variable() should now assign $teh_variable the string that was passed as the first parameter (remember, $_[0] holds the first argument passed to a subroutine). Then, once the body of the function is finished, execution jumps back to the print statement, which prints 27.

Huh? 27? But $teh_variable was assigned a multi word string in the subroutine called immediately before the print statement. The reason that 27 was printed was that the function that was called assigns the string passed to it to the local variable $teh_variable. If we were to include a print statement inside change_teh_variable() to print $teh_variable, we'd see that in that case it did indeed hold the string passed as the first argument (try it if you don't believe me.. ;-)), but as the variable is local to change_the_variable(), the copy that the main segment of the script uses is not modified.

The final basic concept having to do with subroutines that I introduce here is the ideas that a function returns a value. As an example of what a return value is, take the keys function, which returns a list of the keys that the hash specified as a parameter consists of. The function "keys(%hash)" by itself is completely useless, because the result of it is returned. This means that we have to use the function as part of a larger statement. The keys function returns a list of keys in the hash, and for the function to be of any use at all, something must be done with that return value - often it is assigned to an array.

As another illustration of what a return value is, take a subroutine that is used to add two numbers. It could handle the result in a few possible ways. It may print the result, assign it to a variable, or simply return it. In the case that it returns a value, it must be used as part of a larger expression to be made useful. Assuming the function addtwo() returns the sum of its two parameters, the following is an example of how the result would be assigned to $sum_var:
-----
$sum_var = &addtwo 3, 4; # $sum_var now holds 7
-----
If we were simply to use the subroutine addtwo() by itself, it would be a waste of computing power (not much though, mind you, as it's a pretty simple function). It would add the numbers, return the value, but as what would be done with the return value is not specified, it would simply move on. If the concept of return values still seems confusing, you can think of it this way: when a subroutine with a return value is executed, the body is executed as with any other subroutine, but the use of the subroutine itself inside your code evaluates to whatever the return value is, just like a variable evaluates to what its value is.

Returning values from a subroutine in Perl is not very complex at all, although the idea may seem confusing at first. It is simply done with the return function, which accepts one parameter: the value to return. For instance, the following subroutine returns the string "Less than fifty" if the number entered by the user via STDIN (standard input) is less than fifty, otherwise it returns "Not less than fifty":
-----
sub isltfifty {
print "Enter a number por favor: ";
chomp(my $num = <STDIN>
if($num < 50) {
return "Less than fifty";
}
else {
return "Not less than fifty";
}
}
-----

| Regular Expressions |

Regular expressions make Perl a very powerful, flexible, and suitable language for dealing with text manipulation. They give the programmer the ability to tell whether a string matches a certain pattern. Although this may not sound like much, it is. With regular expressions, you can tell precisely what pattern(s) a string matches, remember what parts of the patterns matched the string, etc.

As a simple example of how to use regular expressions, let's say that we want to see if the variable $string contains the text 'in'. The code to do this is as follows:
-----
$string =~ /in/;
-----
This statement by itself will do nothing. It is, however, useful as part of a boolean expression in conditional statements such as if. The binding operator, =~, is used to see whether the string on the left matches the regular expression on the right. So far, we know that whatever text is assigned to $string is being tested with the regular expression /in/. The expression returns a value of 1 if the match succeeds and returns a value of undef otherwise. Undef, short for undefined, is a special value in Perl. When used as a string, it is a null string (""), when used as a number it is 0, and when used in boolean expressions it is false. It is the value that a variable holds before it has been assigned a value.

The regular expressions placed on the right side of the =~ operator will be the primary focus of this section of the tutorial, as this is the part that allows the programmer to specify what pattern is to be matched and is what provides the power and flexibility associated with regular expressions -- this is the regular expression. Regular expressions in Perl consist of the pattern to be matched between two forward slashes // as delimiters. In the previous example, the regular expression is /in/, which matches all strings that contain the sequence of characters 'in' within them, regardless of the rest of the string. Thus, if $string had contained the text "Binary Universe" the pattern match would have returned true, but if it had been simply the string "bananas" it would not have done so.

Regular expressions also allow the use of wildcards. The . (yes, a single dot) wildcard matches any one character with the exception of a newline. Thus, the pattern /bl.h/ matches the strings 'blah', 'bl4h', 'bl@h', 'bl.h', 'bl2h', or any other string that has a b and an l next to each other, a character, than an h, but does not match strings such as 'binary universe', that do not. To include a period literally within a string, if you wanted to match the string 'www.binaryuniverse.net' for instance, it should be prefixed with a backslash. Eg: /www\.binaryuniverse\.net/.

A useful feature that is similar to the wildcard character . is the ability to use character classes. Character classes are used in the same way as the dot character but they do not match any character -- they match any of the characters specified by the programmer that are included between [] brackets. As an example, the following regular expression matches either the string 'cat' or 'car':

/ca[tr]/

Ranges of characters are specified by including the first character of the range, a -, then the last character of the range. For instance, /bl[a-z]h/ matches bl, any lowercase alphabetic character, then the character h. There are even a few common shortcuts for different classes of characters that are commonly used. These shortcuts consist of a backslash and then the letter than represents them. For instance \d matches any digit. The following table lists the most common shortcuts and the character classes they represent:

\d - Any digit. [0-9]
\w - Any 'word' character, including any alphanumeric character or an underscore. [a-zA-Z0-9_]
\s - Any whitespace character (tab, space, etc).
\D - Any non-digit.
\W - Any non-'word' character.
\S - Any non-whitespace character.

The ^ character is used to indicate the opposite of a character class. That is to say, [^0-5] matches any character that is not a digit between 0 and 6. Thus, [^0-9] is the same as [\D] which matches anything that is not a digit.

The pipe (|) character is used to represent 'or' in regular expressions, similar to the way in which two pipes (||) represent logical or in Perl. Thus, the following regular expression matches either 'w00t', 'woot', or 'wewt':

/w00t|woot|wewt/

It should be noted, however, that parenthesis are used for determining precedence. Thus, the previous regular expression could be rewritten to be shorter, as in the following example:

/w(00|oo|ew)t/

This matches a w, than either 00, oo, or ew, then a t.

Another important concept to grasp where regular expressions are concerned is that of quantifiers. As an example of how a quantifier works, take the commonly used * quantifier. When the * quantifier is used, it tells Perl that the previous part of the regular expression (the previous character unless grouped otherwise with parenthesis) may appear any number of times (5, 9, 276, even 0) and the regular expression as a whole should still return true. For instance, the following regular expression matches the string 'perl', 'pppppppppppppppppppperl', 'erl', or any other string that begins with any number of p characters followed immediately by the text 'erl':
-----
/p*erl/
-----
And, as an example of how grouping regular expressions with parenthesis works with the * quantifier, the following matches 'perperperl', 'perl', 'l', or any other string that begins with the string 'per' any number of times (including, must I remind you, 0) followed by the character l:
-----
/(per)*l/
-----
With this knowledge, we can conclude that if we want a specific part of a string to match ANY character ANY number of times (including, of course, 0), we can simply use the dot metacharacter followed by the asterisk quantifier. Thus, the following regular expression matches ANY string:
-----
/.*/
-----

This segment of my Perl tutorial has introduced only the basics of what is a very wide and powerful feature of the Perl scripting language: regular expressions. Expect a more in depth look at good ol' regexps in part 3 as well as an introduction to more topics that Perl has to offer.

| Wrapping It Up |

As I did in my last Perl tutorial, I will attempt to give an example of a script that implements most of the main topics covered in this tutorial. This time, I've chosen a very pointless example -- it's a script that maps IP address to the name of the owner of the computer that they correspond to (pointless for many reasons, including the fact that most people have dynamic IPs :P). It has three commands that can be entered - 'add', which is used to add an IP address to the database of IPs (the database is actually only temporary and is erased once execution of the script stops, which is another reason this is a very inneffective script; it could be done otherwise, but file I/O and Perl DBMs have not been covered yet), 'list', which lists the IPs entered in the database, and 'delete', which deletes an IP from the database. Here it is:
-----
#!/usr/bin/env perl
while() {
print "Please enter a command: ";
chomp (my $cmd = <STDIN>
if ($cmd =~ /add/) {
&add_ip_to_hash;
}
elsif ($cmd =~ /delete/) {
&delete_ip_from_hash;
}
elsif ($cmd =~ /list/) {
&list_ips;
}
elsif ($cmd =~ /exit/) {
exit 0;
}
else {
print "Invalid command!\n";
}
}

sub add_ip_to_hash {
print "Enter ip address: ";
chomp (my $ip = <STDIN>
print "Enter owner of computer that $ip corresponds to: ";
chomp (my $o = <STDIN>
$hash{$ip} = $o;
}

sub delete_ip_from_hash {
print "What IP would you like to delete from the database? ";
chomp (my $ip = <STDIN>
delete $hash{$ip};
}

sub list_ips {
my @keyz0rz = keys (%hash);
foreach (@keyz0rz) {
print $_, " --> ", $hash{$_}, "\n";
}
}
-----

Ok. I lied when I said the only commands were 'add', 'delete', and 'list'. There's one other command. As a challenge, I'm going to let you figure out a) what that command is b) how it works the way it does, and c) how the whole script works the way as it does as a whole.

If you have any questions regarding the content of this tutorial, please contact me via IRC or forums. I'm usually around BU (binaryuniverse.net), ASO (anomalous-security.org), st0rage (st0rage.org), DI-Sec (di-security.org), and some other places.
====================