Perl and DBI

Databases are a mission-critical part of any company’s resources. If you program in Perl, you’ll want to learn about the DBI, which can help you connect to many popular databases. This article, the first part of a series, is excerpted from chapter 15 of the book Beginning Perl (Apress; ISBN: 159059391X).

It is now time to talk about one of Perl’s best modules: the Database Independent ( DBI ) module. DBI provides an easy-to-use and portable (both across operating systems and across databases) application programming interface (API) that allows us to connect to a wide variety of databases including Oracle, Sybase, Informix, MySQL, mSQL, Postgress, ODBC, and many others, even files with comma-separated values (CSV). With this module we can access and administer data bases from our Perl programs, combining the power and enjoyment of Perl with the usefulness of databasing information.

In this chapter we will introduce the concept of SQL and discuss the most common ways to use it. Then we will discuss DBI and the related DBD (Database Driver) modules. We will then write some Perl code to access and update a MySQL database. Finally, we will take our newfound knowledge and connect it with our topic from the last chapter and create a simple web interface to a database by combining Perl, DBI and CGI. This sounds like fun, so let’s get to it.

Structured Query Language (SQL, pronounced as “EssQueueEl” by most and “Sequel” by some) is a language allowing a programmer access to a relational database. It is relatively easy to use—compared to Perl, learning SQL is a snap. We will talk about some of the most common SQL queries, or commands that access a database, and in talking about them we will describe the language to the point that learning the remaining details will be simply a matter of referring to an SQL book or website.

But we are getting ahead of ourselves. Before we can talk about SQL we need to discuss relational databases.

{mospagebreak title=Introduction to Relational Databases}

In order to talk about SQL, we will need to start by talking about relational databases. There are two important facts about relational databases. First, the content in a relational database is persistent—the data continues to exist after the execution of the program that accesses or modifies it. This is much like writing the data to a file on disk that will stay on the disk after the file is created, read from, or modified. The second important fact is that relational databases, unlike files on disk, allow concurrent access and updates from multiple users and processes. This means that more than one user can access the database at the same time—the database server takes care of making sure the changes are made to the data in a safe way.

A relational database, simply put, is a database of tables that can relate to one another in some way. A table is a collection of rows of data. Every row of data has the same basic pieces of information, called fields. There are a lot of buzzwords here, so let’s describe each of these by an example.

Let’s say we want to keep some information about our favorite musicians. The information includes their name, phone number (since we often call them up and chat), and the instruments that they play. We might start by creating a list of the musicians like this:1

Roger Waters 555-1212
Geddy Lee 555-2323
Marshall Mathers III 555-3434
Thom Yorke 555-4545
Lenny Kravitz 555-5656
Mike Diamond 555-6767

This list of musicians shows six lines of data. These lines are called rows in relational database–speak. We would take these six rows and place them together into one collection of data, called a table. Normally, when we place data within a table, we want to create a unique identifier for the row, called a key—just in case we had two different Marshall Mathers III in our table we could access the one we are interested in using this unique value. We will name the key player_id and name the other columns, or fields, as well:

player_id name phone
1 Roger Waters 555-1212
2 Geddy Lee 555-2323
3 Marshall Mathers III 555-3434
4 Thom Yorke 555-4545
5 Lenny Kravitz 555-5656
6 Mike Diamond 555-6767

What we have created here is a table (let’s name it musicians ) with three fields ( player_id , name , and phone ) and six rows of information. With this one example we have defined most of our relational database buzzwords, except relational.

{mospagebreak title=The Relational of Relational Database}

Normally when we create a database of information, we spread the data out among several different tables. These tables will relate to one another in some way, usually by a key or other field in the table. 

As an example, let’s expand our information about musicians to describe what instruments each of our musicians play and some important facts about those instruments. We could add each instrument to the row in the musicians table, but that would cause a lot of repeated information. For instance, three of our musicians play the guitar, so any information we provide for a guitar would have to be repeated for each of the three musicians. Also, several of our musicians play more than one instrument (for instance, Thom Yorke plays guitar, sings vocals, and also plays keyboard). If we provide each instrument that Thom plays, our table would become big and difficult to work with.

Instead, let’s create another table, named instruments , that will have this information:

inst_id instrument type difficulty
1 bagpipes reed 9
2 oboe reed 9
3 violin string 7
4 harp string 8
5 trumpet brass 5
6 bugle brass 6
7 keyboards keys 1
8 timpani percussion 4
9 drums percussion 0
10 piccolo flute 5
11 guitar string 4
12 bass string 3
13 conductor for-show-only 0
14 vocals vocal 5

Now that we have defined some instruments and our opinions of their related difficulties, we somehow need to map the instrument information to the information stored in the musicians table. In other words, we need to indicate how the instruments table relates to the musicians table. We could simply add the inst_id value to the musicians table like this:  

player_id name phone inst_id

1 Roger Waters 555-1212 12

and so on, but remember that many of our musicians play more than one instrument. We would then need two rows for Roger Waters (he sings, too) and three rows for Thom Yorke. Repeating their information is a waste of memory and makes the database too complex. Instead, let’s create another table that will connect these two tables. We will call it what_they_play and it will have two fields: player_id and inst_id .

player_id inst_id
1 11
1 14
2 12
2 14
3 14
4 7
4 11
4 14
5 11
5 14
6 9

To read all this information and make sense of how it relates, we would first look in the musicians table and find the musician we want, for instance Geddy Lee. We find his player_id , 2, and use that value to look in the what_they_play table. We find two entries in that table for his player_id that map to two instr_id s: 12 and 14. Taking those two values, we use them as the keys in the instruments table and find that Geddy Lee plays the bass and sings for his band.2

This example illustrates that the musicians table relates to the instruments table through the what_they_play table. Breaking up the data in our database into separate tables allow us to list the information that we need only once and is often more logical than listing all the information in a single table—this is called normalization.

{mospagebreak title=We Need an SQL Server—MySQL}

Before we can show examples of SQL, we need an SQL server. There are many available to choose from, some that cost money, some that cost a lot of money, and some that are free. Given that we like free, we are going to choose one of the best, most powerful SQL servers available: MySQL.

MySQL (www.mysql.com) is open source and available for many different operating systems. It is relatively easy to install and administer. It is also well documented (http://dev.mysql.com/doc/mysql/en/) and there are many good books available including the excellent The Definitive Guide to MySQL, Second Edition by Michael Kofler (Apress, 2003). MySQL is an excellent choice for small, medium, and large databases. And did we mention it is free?

Installing MySQL

If you are a Linux user, the chances are MySQL is installed already. Do a quick check of your system to see. If not, it will have to be installed.

Installation instructions can be found at the MySQL website (http://dev.mysql.com/doc/ mysql/en/Installing.html). Since it is so well documented there, we will not repeat that information here. You can also check out The Definitive Guide to MySQL, Second Edition.

Testing the MySQL Server

Just to be sure all is well, let’s enter a few MySQL commands to the shell prompt to see if everything is working. The following examples assume that the MySQL root user (not to be confused with the Unix root user) has been given a password. Giving the MySQL root user a password is a very good idea if your server will be available over the network—you don’t want a pesky cracker logging into the server and being able to do devastating and destructive things like modifying or deleting your data. Let’s say root ’s password is “RootDown”.3

First, this command will show all the tables set up on the server:

$ mysqlshow -u root -p
Enter password: RootDown
+——————+
|    Databases     |
+——————+
| mysql            |
| test             |
+——————+

This command shows all the tables in the database named mysql :

$ mysqlshow -u root -p mysql
Enter password: RootDown
Database: mysql
+————–+
|    Tables    |
+————–+
| columns_priv |
| db           |
| func         |
| host         |
| tables_priv  |
| user         |
+————–+

If these commands worked, then all is well with our MySQL server. We can now create a database to store our musician information.

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]

chat