An Introduction to Database Normalization - So Why Normalize? (
Page 3 of 5 )
Thus far the only thing
that I have really stated about database normalization is that it provides for
table optimization through the investigation of entity relationships. But why is
this necessary? In this section, I’ll elaborate a bit upon why normalization is
necessary when creating commercial database applications.
Essentially, table
optimization is accomplished through the elimination of all instances of data
redundancy and unforeseen scaleability issues.
Redundancy
Data redundancy is exactly what you think it is; the
repetition of data. One obvious drawback of data repetition is that it consumes
more space and resources than is necessary. Consider the following
table:
Table 1-1: Poorly defined table
|
student_id |
class_name |
time |
location |
professor_id |
|
999-40-9876 |
Math 148 |
MWF 11:30 |
Rm. 432 |
prof145 |
|
999-43-0987 |
Physics 113 |
TR 1:30 |
Rm. 12 |
prof143 |
|
999-42-9842 |
Botany 42 |
F 12:45 |
Rm. 9 |
prof167 |
|
999-41-9832 |
Matj 148 |
MWF 11:30 |
Rm. 432 |
prof145 |
Basically this
table is a mapping of various students to the classes found within their
schedule. Seems logical enough, right? Actually, there are some serious issues
with the choice to store data in this format. First of all, assuming that the
only intention of this table is to create student-class mappings, then there
really is no need to repeatedly store the class time and professor ID. Just
think that if there are 30 students to a class, then the class information would
be repeated 30 times over!
Moreover, redundancy introduces the
possibility for error. You might have noticed the name of the class found in the
final row in the table (Matj 148). Given the name of the class found in the
first row, chances are that Matj 148 should actually be Math 148! While this
error is easily identifiable when just four rows are present in the table,
imagine finding this error within the rows representing the 60,000 enrolled
students at my alma mater, The Ohio State University. Chances that you’ll find
these errors are unlikely, at best. And the cost of even attempting to find them
will always be high.
Unforeseen Scaleability Issues
Unforeseen scaleability issues generally
arise due to lack of forethought pertaining to just how large a database might
grow. Of course, as a database grows in size, initial design decisions will
continue to play a greater role in the speed of and resources allocated to this
database. For example, it is typically a very bad idea to limit the potential
for expansion of the information that is to be held within the db,
even
if there are currently no plans to expand. For example, structurally
limiting the database to allot space for only three classes per student could
prove deadly if next year the school board decides to permit all students to
schedule three classes. This also works in the opposite direction; What if the
school board subsequently decides to only allow students to schedule two
classes? Have you allowed for adequate flexibility in the design so as to easily
adapt to these new policies?
The remedy to these problems is through the
use of a process known as database normalization. A subject of continued
research and debate over the years, several general rules have been formulated
that layout the process one should follow in the quest to normalize a database.
I’ll discuss these rules in the next section, “The Three Normal
Forms”.