Put simply, normalization is an attempt to make sure you do not destroy true data or create false data in your database. Errors are avoided by representing a fact in the database one way, one time, and in one place. Duplicate data is a problem as old as data processing. Efficient and accurate data processing relies on the minimizing redundant data and maximizing data integrity. Normalization and the Normal Forms (NF) are efforts to achieve these two core objectives of data processing. This article will examine the concept of normalization in-depth.
First Normal Form states that each attribute in a relation has to be atomic and scalar. It also states that a table must have a primary key to identify any particular row. You cannot store a list of values in a single column and you cannot store a list of columns that emulate a fixed size list - also known as a repeating group. For example, suppose you were to create a table to hold contact information for business contacts:
This table contains a repeating group - phone1-3. More often than not, several of those columns will be empty, which is a waste. It is also nearly impossible to enforce any uniqueness for this group because a constraint cannot be declared such that any phone1 is unique across phone1, phone2 and phone3.
Consider another set of problems - what if you want to add a company, not a person, to your Contacts table? Or just a phone number and a first name? You would have to leave one or more of the candidate key columns (first_name, last_name, company), which would most certainly violate the only possible primary key constraint. What if you want to delete a fax number from your table? If you simply DELETE WHERE fax=X, you may be deleting people as well.