Database normalization is organizing data in such a way as to eliminate redundancy and get rid of duplicate data. It was first proposed by Edgar F. Codd as a part of his relational model.Redundancies can adversely affect analysis of data.We use data normalization for the following purposes
- To logically group together data
- To resolve any conflicting data
- Formatting the data
Finally data normalization consolidates data, combining it into much more organized structure.
Why is Data Normalization important?
Data normalization gets rid of a number of anomalies that can make analysis of the data complicated. The anomalies that can occur without normalization are
If we update the data partially but not completely. For example if we don’t update all rows in a certain table , inconsistencies appear.
Circumstances where certain facts cannot be recorded at all. For example facts we cannot record until we know information for the entire row in a table.
Deletion of data representing certain facts necessitates deletion of data. For example deletion of a row in a table causes removal of more than one set of fact.
Benefits of Data Normalization
- One of the most notable is that databases takes up less space i.e decrease disk space and as a result increases performance.
- It becomes easier to change and update data within the database especially when the data comes from multiple sources.
- Makes it easy to perform statistical modeling(process of applying statical analysis to a dataset)
- To reduce the need for restructuring as we introduce new data inside the table.
- To simplify writing queries.
Common Forms of Data Normalization
The three common forms of data normalization are First Normal(1NF), second Normal(2NF) and Third Normal(3NF). UNF is unnormalized form.
A relational database relation is often described as “normalized” if it meets 3NF. Most 3NF relations are free of insertion, update and deletion anomalies.
Putting data into tables which serves a single purpose.
Data is stored with no repeating groups of columns in the table
The table is in 1NF and all columns depend on the table’s primary key(no columns should depend on other columns)
The table is in 2NF and all of its columns are not transitively dependent on the primary key.
Primary Key:Uniquely identify record in a table. A primary key is generally the id in a table.
Foreign Key:A field in the table that is a primary key in another table.
Transitive Dependency:An indirect relationship between values in the same table that causes a functional dependency. For example Table 2 is functionally dependent on Table 1 and Table 3 is functionally dependent on Table 2 . Therefore Table 3 is transitively dependent on Table 1 via Table 2.