Put simply, normalization is an attempt to make sure you do not destroy true data or create false data in your database. Errors are avoided by representing a fact in the database one way, one time, and in one place. Duplicate data is a problem as old as data processing. Efficient and accurate data processing relies on the minimizing redundant data and maximizing data integrity. Normalization and the Normal Forms (NF) are efforts to achieve these two core objectives of data processing. This article will examine the concept of normalization in-depth.
Normalization is the process of reducing duplication in a database, with the ultimate goal of eliminating duplicate data entirely. While duplicated data can cause a database to be greedy with disk space, the bigger issue is consistency. Duplication creates the risk of data corruption when information is inserted, updated, or deleted, by having a particular piece of information in more than once place.
Normalization as a formal process has been around since the early 1970s when Edgar Codd proposed the first normal form (1NF). Codd later defined 2NF and 3NF. Since that time, several higher level normal forms have been defined. 4NF and 5NF are often used only in an academic sense because queries against "real world" data in a 4NF or 5NF schema contain excessive number of joins and require extensive use of views. 6NF is fairly new and contains rules specific to temporal data. As one progresses from 1NF on 5NF and even 6NF, the data modeling requirements become more strict and each new form provides a lower degree of duplication than the preceding form.