The goal of normalization is to reduce duplication of data.
Different levels of normalization are called normal forms.
A table is said to be in X normal form
if it satisfies all rules
defined by that normal form and all the normal forms before X.
- Single valued columns (each column should have atomic value, no multiple values)
- Column domain (for any column) should not change.
- Unique names for columns.
- Order (of rows/columns) does not matter.
- No duplicate records (every record has a primary key).
- no non-prime column that is not a part of primary key should be functionally dependent on any proper subset of a candidate key. In other words, there should be No partial dependency (no column should depend on the part of the primary key).
Functional dependency: Denoted with A => B.
A and B are columns of a table. An oversimplified explanation of Function dependency is
as follows.
If I know a value in column A then I for sure know the value in column B
but the reverse is not true.
E.g. A is student number and B is student name.
I can tell the name of the student from student number, but
I cannot tell the student number from the name because there could be multiple students with the same name.
If you feel adventurous, then read this Functional Dependency Wikipage
- No transitive dependency (i.e. no column should depend on non-key column).
- For any dependency A => B, A should be a super key. In other words, for a dependency A => B, A cannot be a non-prime column, if B is a prime column.
- No multi-value dependency.
Consider the following table
+-------------+------------+-----------------------+
| Employee Id | Name | Contact |
+-------------+------------+-----------------------+
| 101 | Amit | 0684927317 |
| 102 | Ben | 0634899234, ben@bu.nl |
| 103 | Cathy | 0647882102, cat@dog.us|
| 104 | Dua | 0622467559 |
+-------------+------------+-----------------------+
This table is not in 1NF because rule (1) of 1NF is violated because
row 2 and row 3 contain multiple values for the column Contact
.
Also the rule (2) of 1NF is violated because the column Contact
contains
numeric values (for phone numbers) and string value (for emails).
This table could be converted to 1NF as follows:
+-------------+------------+------------------------+
| Employee Id | Name | Phone | Email |
+-------------+------------+------------------------+
| 101 | Amit | 0684927317 | NULL |
| 102 | Ben | 0634899234 | ben@bu.nl |
| 103 | Cathy | 0647882102 | cat@dog.us|
| 104 | Dua | 0622467559 | NULL |
+-------------+------------+----------------------- +
In real life, you actually need to
- DROP column Contact.
- ADD column Phone with the type varchar(N), choose N wisely.
- ADD column Email with the type varchar(50).
Consider the following table (employee-project M-M relationship table).
+-------------+------------+-----------------------+
| Employee Id | Project ID | Project Budget |
+-------------+------------+-----------------------+
| 101 | 1001 | 317 |
| 102 | 1001 | 234 |
| 103 | 2001 | 102 |
| 104 | 2001 | 559 |
+-------------+------------+-----------------------+
2NF is violated here because
proj_budget (non prime column)
proj_no => proj_budget (functionally dependent on proj_no)
proj_no (It is part of the candidate key)
emp_no + proj_no (is a candidate key)
This table could be converted to 2NF by removing the Project Budget
column and
adding it to the project table.
Consider the following table (employees)
+-------------+------------+-----------------------+
| Employee Id | Dept Id | Dept Location |
+-------------+------------+-----------------------+
| 101 | 2221 | Amsterdam |
| 102 | 2221 | Amsterdam |
| 103 | 3335 | Rome |
| 104 | 3335 | Rome |
+-------------+------------+-----------------------+
This table violates the 3NF because there is a transitive dependency.
Employee Id => Dept Id
and Dept Id => Dept Location.
Dept Location
column depends on the Dept Id
which is not a primary key column.
Consider the following table (students opting for subjects)
+-------------+------------+-----------------------+
| Student Id | Subject | Professor |
+-------------+------------+-----------------------+
| 101 | Java | X |
| 102 | Java | X |
| 101 | C++ | Y |
| 103 | C++ | Y |
| 103 | Java | X |
| 104 | C++ | Y |
+-------------+------------+-----------------------+
This table violates the 3.5NF because there is a functional dependency
Professor => Subject
and Professor
is not a super key.
Student Id + Subject
is the primary key. Hence Subject
is a prime column.
This table could be converted to 3.5NF as follows:
+-------------+------------+
| Student Id | Prof Id |
+-------------+------------+
| 101 | P0001 |
| 102 | P0001 |
| 101 | P0002 |
| 103 | P0002 |
| 103 | P0001 |
| 104 | P0002 |
+-------------+------------+
and
+-------------+------------+----------+
| Prof Id | Professor | Subject |
+-------------+------------+----------+
| P0001 | X | C++ |
| P0002 | Y | Java |
+-------------+------------+----------+
Consider the following table (students opting for subjects)
+-------------+------------+-----------+
| Student | Subject | Hobby |
+-------------+------------+-----------+
| Benno | Excel | Violin |
| Benno | Python | Woodwork |
| Benno | Dutch | Paint |
| Lukas | Java | Running |
| Lukas | C++ | Reading |
+-------------+------------+-----------+
This table violates 4NF because Subject
and Hobby
are independent of each other.
Hence the hobby of the student must be repeated in the table with each subject
the student chooses.
+-------------+------------+-----------------------+
| Student | Subject | Hobby |
+-------------+------------+-----------------------+
| Benno | Excel | Violin |
| Benno | Excel | Woodwork |
| Benno | Excel | Paint |
| Benno | Python | Violin |
| Benno | Python | Woodwork |
| Benno | Python | Paint |
| Benno | Dutch | Violin |
| Benno | Dutch | Woodwork |
| Benno | Dutch | Paint |
+-------------+------------+-----------------------+
It leads to a lot of repetition. This table could be converted to 4NF by splitting it into two.
+-------------+------------+
| Student | Subject |
+-------------+------------+
| Benno | Excel |
| Benno | Python |
| Benno | Dutch |
| Lukas | Java |
| Lukas | C++ |
+-------------+------------+
and
+-------------+-----------+
| Student | Hobby |
+-------------+-----------+
| Benno | Violin |
| Benno | Woodwork |
| Benno | Paint |
| Lukas | Running |
| Lukas | Reading |
+-------------+-----------+
This is a lot of information so one of our mentors (Unmesh) has created a video to try to get the idea across as well:
{% hyf-youtube src="https://www.youtube.com/watch?v=G6nX_0lY75Y" %}