Databases are a mission-critical part of any company's resources. If you program in Perl, you'll want to learn about the DBI, which can help you connect to many popular databases. This article, the first part of a series, is excerpted from chapter 15 of the book Beginning Perl (Apress; ISBN: 159059391X).
Normally when we create a database of information, we spread the data out among several different tables. These tables will relate to one another in some way, usually by a key or other field in the table.
As an example, let’s expand our information about musicians to describe what instruments each of our musicians play and some important facts about those instruments. We could add each instrument to the row in themusicianstable, but that would cause a lot of repeated information. For instance, three of our musicians play the guitar, so any information we provide for a guitar would have to be repeated for each of the three musicians. Also, several of our musicians play more than one instrument (for instance, Thom Yorke plays guitar, sings vocals, and also plays keyboard). If we provide each instrument that Thom plays, our table would become big and difficult to work with.
Instead, let’s create another table, namedinstruments, that will have this information:
inst_id
instrument
type
difficulty
1
bagpipes
reed
9
2
oboe
reed
9
3
violin
string
7
4
harp
string
8
5
trumpet
brass
5
6
bugle
brass
6
7
keyboards
keys
1
8
timpani
percussion
4
9
drums
percussion
0
10
piccolo
flute
5
11
guitar
string
4
12
bass
string
3
13
conductor
for-show-only
0
14
vocals
vocal
5
Now that we have defined some instruments and our opinions of their related difficulties, we somehow need to map the instrument information to the information stored in themusicianstable. In other words, we need to indicate how theinstrumentstable relates to themusicianstable. We could simply add theinst_idvalue to themusicianstable like this:
player_id name phone inst_id
1 Roger Waters 555-1212 12
and so on, but remember that many of our musicians play more than one instrument. We would then need two rows for Roger Waters (he sings, too) and three rows for Thom Yorke. Repeating their information is a waste of memory and makes the database too complex. Instead, let’s create another table that will connect these two tables. We will call itwhat_they_playand it will have two fields:player_idandinst_id.
player_id
inst_id
1
11
1
14
2
12
2
14
3
14
4
7
4
11
4
14
5
11
5
14
6
9
To read all this information and make sense of how it relates, we would first look in themusicians table and find the musician we want, for instance Geddy Lee. We find hisplayer_id, 2, and use that value to look in thewhat_they_playtable. We find two entries in that table for hisplayer_idthat map to twoinstr_ids: 12 and 14. Taking those two values, we use them as the keys in theinstrumentstable and find that Geddy Lee plays the bass and sings for his band.2
This example illustrates that themusicianstable relates to theinstrumentstable through thewhat_they_playtable. Breaking up the data in our database into separate tables allow us to list the information that we need only once and is often more logical than listing all the information in a single table—this is called normalization.