HomeMySQL Page 3 - An Introduction to Database Normalization
So Why Normalize? - MySQL
A database can be great fun, right? Yes, of course!There are though, a couple things that can ruin allthat hard work and effort you put into your efficient little database. Today we discuss how to keep that beloved bin of data from going bad on you: databasenormalization.
Thus far the only thing that I have really stated about database normalization is that it provides for table optimization through the investigation of entity relationships. But why is this necessary? In this section, Iíll elaborate a bit upon why normalization is necessary when creating commercial database applications. Essentially, table optimization is accomplished through the elimination of all instances of data redundancy and unforeseen scaleability issues.
Data redundancy is exactly what you think it is; the repetition of data. One obvious drawback of data repetition is that it consumes more space and resources than is necessary. Consider the following table:
Table 1-1: Poorly defined table
Basically this table is a mapping of various students to the classes found within their schedule. Seems logical enough, right? Actually, there are some serious issues with the choice to store data in this format. First of all, assuming that the only intention of this table is to create student-class mappings, then there really is no need to repeatedly store the class time and professor ID. Just think that if there are 30 students to a class, then the class information would be repeated 30 times over!
Moreover, redundancy introduces the possibility for error. You might have noticed the name of the class found in the final row in the table (Matj 148). Given the name of the class found in the first row, chances are that Matj 148 should actually be Math 148! While this error is easily identifiable when just four rows are present in the table, imagine finding this error within the rows representing the 60,000 enrolled students at my alma mater, The Ohio State University. Chances that youíll find these errors are unlikely, at best. And the cost of even attempting to find them will always be high.
Unforeseen Scaleability Issues
Unforeseen scaleability issues generally arise due to lack of forethought pertaining to just how large a database might grow. Of course, as a database grows in size, initial design decisions will continue to play a greater role in the speed of and resources allocated to this database. For example, it is typically a very bad idea to limit the potential for expansion of the information that is to be held within the db, even if there are currently no plans to expand. For example, structurally limiting the database to allot space for only three classes per student could prove deadly if next year the school board decides to permit all students to schedule three classes. This also works in the opposite direction; What if the school board subsequently decides to only allow students to schedule two classes? Have you allowed for adequate flexibility in the design so as to easily adapt to these new policies?
The remedy to these problems is through the use of a process known as database normalization. A subject of continued research and debate over the years, several general rules have been formulated that layout the process one should follow in the quest to normalize a database. Iíll discuss these rules in the next section, ďThe Three Normal FormsĒ.