Perl Basics: Writing and Debugging Programs

In this third part of a three-part article series on the basics of Perl programming, you’ll learn how to use escape sequences in your programming, and how to use the Perl debugger. This article is excerpted from chapter one of the book Beginning Perl, Second Edition by James Lee (Apress; ISBN: 159059391X).

Escape Sequences

UTF8 gives us 65536 characters, and ASCII gives us 256 characters, but on the average keyboard, there’s only a hundred or so keys. Even using the shift keys, there will still be some characters that you aren’t going to be able to type. There’ll also be some things that you don’t want to stick in the middle of your program, because they would make it messy or confusing. However, you’ll want to refer to some of these characters in strings that you output. Perl provides us with mechanisms called escape sequences as an alternative way of getting to them. We’ve already seen the use of n to start a new line. Table 1-1 lists the more common escape sequences.

Table 1-1. Escape Sequences

Escape Sequence Meaning
t Tab
n Start a new line (usually called newline)
r Carriage return
b Back up one character (backspace)
a Alarm (rings the system bell)
x{1F18} Unicode character

 

 

 

 

 

 

 

In the last example, 1F18 is a hexadecimal number (see the upcoming section “Number Systems”) referring to a character in the Unicode character set, which runs from 0000-FFFF. As another example, x{2620} is the Unicode character for a skull-and-crossbones!

Whitespace

As mentioned previously, whitespace is the name we give to tabs, spaces, and newlines. Perl is very flexible about where you put whitespace in your program. We have already seen how we’re free to use indentation to help show the structure of blocks. You don’t need to use any whitespace at all, if you don’t want to. If you’d prefer, your programs can all look like this:

print"Top leveln";{print"2nd leveln";{print"3rd leveln";}print"Where are we?";}

This is considered a bad idea. Whitespace is another tool we have to make our programs more understandable; let’s use it as such.

Number Systems

If you thought the way computers see characters was weird, we have a surprise for you.

The way most humans count is using the decimal system, or what is called base 10; we write 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and then when we get to 10, we carry 1 in the 10s column and start from 0 again. Then when the 10s column gets to 9 and the 1s column gets to 9, we carry 1 in the 100s column and start again. Why 10? We used to think it’s because we have 10 fingers, but then we found out that the Babylonians counted up to 60, which stopped that.

On the other hand, computers count by registering whether or not electricity flows in a certain part of the circuit. For simplicity’s sake, we’ll call a flow of electricity a 1, and no flow a 0. So, we start off with 0, no flow. Then we get a flow, which represents 1. That’s as much as we can do with that part of the circuit. 0 or 1, off or on. Instead of base 10, the decimal system, this is base 2, the binary system. In the binary system, one digit represents one unit of information: one binary digit, or bit.

When we join two parts of the circuit together, things get more interesting. Look at them both in a row, when they are both off, the counter reads 00. Then one comes on, so we get 01. Then what? Well, humans get to 9 and have to carry 1 to the next column, but computers only get to 1. The next number, number 2, is represented as 10. Then 11. And we need some more of our circuit. Number 4 is 100, 5 is 101, and so ad infinitum. If we got used to it, and we used the binary system naturally, we could count up to 1023 on our fingers.

This may sound like an abnormal way to count, but even stranger counting mechanisms are all around us. As this is being written, it’s 7:59 p.m. In one minute, it’ll be 8:00 p.m., which seems unremarkable. But that’s a base 60 system. In fact, it’s worse than that—it doesn’t stay in base 60, because hours carry at 24 instead of 60. Anyone who’s used the Imperial measurement system, a Chinese abacus, or pounds, shillings, and pence knows the full horror of mixed base systems, which are far more complicated than what we’re dealing with here.

As well as binary, there are two more important sequences we need to know about when talking to computers. We don’t often get to deal with binary directly, but the following two sequences have a logical relationship to base 2 counting. The first is octal, base 8.

Eight is an important number in computing; bits are organized in groups of eight to form bytes, giving you the range of 0–255 we saw earlier with ASCII. Each ASCII character can be represented by one byte. Octal is therefore a good way of counting bits, although it has fallen out of fashion these days. Octal numbers all start with 0 (that’s a zero, not an oh), so we know they’re octal, and proceed as you’d expect: 00, 01, 02, 03, 04, 05, 06, 07, carry one, 010, 011, 012 . . . 017, carry one, 020, and so on. Perl recognizes octal numbers if you’re certain to put that zero in front, like this:

print 06301;

which prints out the decimal number

3265

The second is called the hexadecimal system, as mentioned previously. Of course, programmers are lazy, so they just call it hex. (They like the wizard image.)

Decimal is base 10, and hexagons have six sides, so this system is base 16. As you might have guessed from the number 1F18 shown previously, digits above 9 are represented by letters, so A is 10, B is 11, and so on, all the way through to F, which is 15. We then carry one and start with 10 (which, in decimal, is 16) all the way up through 19, 1A, 1B, 1C, 1D, 1E, 1F, and carry one again to get 20 (which in decimal is 32). The magic number 255, the maximum number we can store in one byte, is FF. Two bytes next to each other can get you up to FF FF, better known as 65535. We met 65535 as the highest number in the Unicode character set, and you guessed it, a Unicode character can be stored as a pair of bytes.

To get Perl to recognize hex, place 0x in front of the digits so that

print 0xBEEF;

gives the answer

48879

{mospagebreak title=The Perl Debugger}

One thing you’ll soon notice about programming is that you’ll make mistakes; mistakes in programs are called bugs. Bugs are almost entirely unavoidable, and creating bugs does not mean you’re a bad programmer. Windows 2000 allegedly shipped with 65,000 bugs, but then that’s a special case, and even the greatest programmers in the world have problems with bugs. Donald Knuth’s typesetting software TeX has been in use for more than 20 years, and Professor Knuth was still finding bugs until a couple of years ago. Who can tell when all the bugs are out anyway?

While we will be showing you ways to avoid getting bugs in your program, Perl provides you with a tool to help find and trace the causes of bugs. Naturally, any tool for getting rid of bugs in your program is called a debugger. Mundanely enough, the corresponding tool for putting bugs into your program is called a “programmer.”

To use the debugger, start your program with the -d option as in

$ perl -d myprogram.pl

See perldoc perldebug for information about the debugger.

Summary

We’ve started on the road to programming in Perl, and programming in general. We’ve seen our first piece of Perl code, and hopefully, you were able to get it to run.

Programming is basically telling a computer what to do in a language it comprehends. It’s about breaking down problems or ideas into byte-sized chunks (as it were), and examining what needs to be done in order to communicate them clearly to the machine.

Thankfully, Perl is a language that allows us a certain degree of freedom in our expression, and, so long as we work within the bounds of the language, it won’t enforce any particular method of expression on us. Of course, it may judge what we’re saying to be wrong, because we’re not speaking the language correctly, and that’s how the majority of bugs are born. Generally though, if a program does what we want, that’s enough—TMTOWTDI.

We’ve also seen a few ways of making it easy for ourselves to spot potential problems, and we know there are tools that can help us if we need it. We have examined a little bit of what goes on inside a computer, how it sees numbers, and how it sees characters, as well as what it does to our programs when and as it executes them.

Exercises

  1. Create a program newline.pl containing print "Hi Mom.nThis is my second program.n". Run this and then replace n with a space or a return and compare the results.
  2. Download the code for this book from the publisher’s website at www.apress.com.
  3. Have a look around the Perl homepage at www.perl.com.
[gp-comments width="770" linklove="off" ]

chat