Introduction to mod_perl (part 6): Even More Perl Basics (
Page 1 of 3 )
This article is a third one in a series talking about the essential
Perl basics, that you should know before starting to program for
mod_perl.You will hear a lot about namespaces, symbol tables and lexical
scoping in Perl discussions, but little of it will make any sense
without a few key facts:
Symbols, Symbol Tables and Packages; Typeglobs
There are two important types of symbol: package global and lexical.
We will talk about lexical symbols later, for now we will talk only
about package global symbols, which we will refer to simply as
global symbols.
The names of pieces of your code (subroutine names) and the names of
your global variables are symbols. Global symbols reside in one
symbol table or another. The code itself and the data do not; the
symbols are the names of pointers which point (indirectly) to the
memory areas which contain the code and data. (Note for C/C++
programmers: we use the term `pointer' in a general sense of one piece
of data referring to another piece of data not in a specific sense as
used in C or C++.)
There is one symbol table for each package, (which is why global
symbols are really package global symbols).
You are always working in one package or another.
Like in C, where the first function you write must be called main(),
the first statement of your first Perl script is in package main::
which is the default package. Unless you say otherwise by using the
package statement, your symbols are all in package main::. You
should be aware straight away that files and packages are not
related. You can have any number of packages in a single file; and a
single package can be in one file or spread over many files. However
it is very common to have a single package in a single file. To
declare a package you write:
package mypackagename;
From the following line you are in package mypackagename and any
symbols you declare reside in that package. When you create a symbol
(variable, subroutine etc.) Perl uses the name of the package in which
you are currently working as a prefix to create the fully qualified
name of the symbol.
When you create a symbol, Perl creates a symbol table entry for that
symbol in the current package's symbol table (by default
main::). Each symbol table entry is called a typeglob. Each
typeglob can hold information on a scalar, an array, a hash, a
subroutine (code), a filehandle, a directory handle and a format, each
of which all have the same name. So you see now that there are two
indirections for a global variable: the symbol, (the thing's name),
points to its typeglob and the typeglob for the thing's type (scalar,
array, etc.) points to the data. If we had a scalar and an array with
the same name their name would point to the same typeglob, but for
each type of data the typeglob points to somewhere different and so
the scalar's data and the array's data are completely separate and
independent, they just happen to have the same name.
Most of the time, only one part of a typeglob is used (yes, it's a bit
wasteful). You will by now know that you distinguish between them by
using what the authors of the Camel book call a funny character. So
if we have a scalar called `line' we would refer to it in code as
$line, and if we had an array of the same name, that would be
written, @line. Both would point to the same typeglob (which would
be called *line), but because of the funny character (also known
as decoration) perl won't confuse the two. Of course we might
confuse ourselves, so some programmers don't ever use the same name
for more than one type of variable.
Every global symbol is in some package's symbol table. To refer to a
global symbol we could write the fully qualified name,
e.g. $main::line. If we are in the same package as the symbol we
can omit the package name, e.g. $line (unless you use the <strict>
pragma and then you will have to predeclare the variable using the
vars pragma). We can also omit the package name if we have imported
the symbol into our current package's namespace. If we want to refer
to a symbol that is in another package and which we haven't imported
we must use the fully qualified name, e.g. $otherpkg::box.
Most of the time you do not need to use the fully qualified symbol
name because most of the time you will refer to package variables from
within the package. This is very like C++ class variables. You can
work entirely within package main:: and never even know you are
using a package, nor that the symbols have package names. In a way,
this is a pity because you may fail to learn about packages and they
are extremely useful.
The exception is when you import the variable from another package.
This creates an alias for the variable in the current package, so
that you can access it without using the fully qualified name.
Whilst global variables are useful for sharing data and are necessary
in some contexts it is usually wisest to minimise their use and use
lexical variables, discussed next, instead.
Note that when you create a variable, the low-level business of
allocating memory to store the information is handled automatically by
Perl. The intepreter keeps track of the chunks of memory to which the
pointers are pointing and takes care of undefining variables. When all
references to a variable have ceased to exist then the perl garbage
collector is free to take back the memory used ready for
recycling. However perl almost never returns back memory it has
already used to the operating system during the lifetime of the
process.
Lexical Variables and Symbols
The symbols for lexical variables (i.e. those declared using the
keyword my) are the only symbols which do not live in a symbol
table. Because of this, they are not available from outside the block
in which they are declared. There is no typeglob associated with a
lexical variable and a lexical variable can refer only to a scalar, an
array or a hash.
If you need access to the data from outside the package then you can
return it from a subroutine, or you can create a global variable
(i.e. one which has a package prefix) which points or refers to it and
return that. The pointer or reference must be global so that you can
refer to it by a fully qualified name. But just like in C try to avoid
having global variables. Using OO methods generally solves this
problem, by providing methods to get and set the desired value within
the object that can be lexically scoped inside the package and passed
by reference.
The phrase ``lexical variable'' is a bit of a misnomer, we are really
talking about ``lexical symbols''. The data can be referenced by a
global symbol too, and in such cases when the lexical symbol goes out
of scope the data will still be accessible through the global symbol.
This is perfectly legitimate and cannot be compared to the terrible
mistake of taking a pointer to an automatic C variable and returning
it from a function--when the pointer is dereferenced there will be a
segmentation fault. (Note for C/C++ programmers: having a function
return a pointer to an auto variable is a disaster in C or C++; the
perl equivalent, returning a reference to a lexical variable created
in a function is normal and useful.)
-
my() vs. use vars:
With use vars(), you are making an entry in the symbol table, and you
are telling the compiler that you are going to be referencing that
entry without an explicit package name.
With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures
out at compile time which my() variables (i.e. lexical variables)
are the same as each other, and once you hit execute time you cannot
go looking those variables up in the symbol table.
-
my() vs. local():
local() creates a temporal-limited package-based scalar, array, hash,
or glob -- when the scope of definition is exited at runtime, the
previous value (if any) is restored. References to such a variable
are *also* global... only the value changes. (Aside: that is what
causes variable suicide. :)
my() creates a lexically-limited non-package-based scalar, array, or
hash -- when the scope of definition is exited at compile-time, the
variable ceases to be accessible. Any references to such a variable
at runtime turn into unique anonymous variables on each scope exit.