What is Normalization?

Normalization is the process of eliminating redundancy in a database by organizing the data into several smaller tables. The large tables are normalized (converted) into smaller tables by using normalization rules, so that a database administrator can easily handle them. It also eliminates insertion, deletion, and update anomalies.

Importance of normalization in DBMS

Data redundancy, update anomaly, insertion anomaly, and deletion anomaly occur when a database is not organized properly. Eventually, it gets difficult to manage the database. The database also consumes a lot of memory. To overcome this challenge, the database is normalized.

Normalization rules in DBMS

The following normal forms are applied to eliminate redundancy in the database tables:

  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)
  • Boyce & Codd normal form (BCNF)
Graphical representation of classification of Normal forms in DBMS

Studies regarding normalization methods in SQL are still going on, so further normal rules are being developed. However, in real-world applications, most of the results are achieved after applying 3NF.

DBMS normal forms with examples

Below is a detailed explanation for all the normalization forms with an example.

First normal form

The first normal form is considered the first step of the data normalization process. The goal is to achieve a table that can be extended easily. Retrieving data from such tables is also easier.

For a table to be in the first normal form, it should satisfy the following rules:

  • The table should have atomic (single) valued attributes (columns).
  • The domain of the data/ value stored in a column should not be changed.
  • The columns in the database table should have unique names.
  • The order of the data stored in the table does not matter.

Example: Consider the course details table below:

Course NameCourse Contents
ProgrammingJava, C++, Python
ManagementFinance, Production

Here, the course content column contains multiple values and therefore, it is not in 1NF. Below is the normalized version of the table that satisfies the 1NF rule and has no one-to-many relationship.

Course NameCourse Content
ProgrammingJava
ProgrammingC++
ProgrammingPython
ManagementFinance
ManagementProduction

Second normal form

For a table to be in 2NF, it has to satisfy the following rules:

  • The database table must be in the first normal form.
  • There should not be any partial dependency.

A table that is in 2NF has no partial dependency.

Partial dependency: When a non-prime attribute is functionally dependent on a part of the candidate key and not the entire key, it is called partial dependency.

Partial dependency can be removed by removing the attribute that causes partial dependency and transferring it to another table.

Example: Assume the database of a school that stores the data of teachers and their subjects. As you know, teachers can teach more than one subject at a school.

Teacher_IDSubject Teacher_Age
15Chemistry32
15Biology32
47English35
55Math33
55Computer33

Here, the non-prime attribute Teacher_Age is dependent on Teacher_ID. Further, Teacher_ID is the proper subset of a candidate key. This indicates that the table is not in 2NF. To convert it into 2NF, decompose it into two tables:

Table 1: Teacher_Detail table

Teacher_IDTeacher_Age
1532
4735
5533

Table 2: Teacher_Subject table

Teacher_IDSubject 
15Chemistry
15Biology
47English
55Math
55Computer

Third normal form

The third normal form reduces data duplication and also helps to achieve data integrity. A table is in 3NF if it satisfies the following conditions:

  • The table is in 2NF.
  • The table does not contain any transitive dependency.

For every non-trivial functional dependency A → B to be in 3NF, it should fulfill any of these conditions.

  • A is a super key.
  • B is a prime attribute.

If there are two functional dependencies A →B and B → C, then A →C is known as a transitive dependency. 

Example: Assume that a company wants to create a database to store the employees' addresses. So, they create a table called Employee_info, as shown below.

Emp_IDEmp_NameEmp_ZipcodeEmp_StateEmp_City
111John25025UPNew York
112George12542MPBarcelona
113Mary26555MPNorway
114Lisa26555APIllinois
115Adam58622JKAmpton

Here the super key is: {Emp_ID}, {Emp_ID, Emp_Name}, {Emp_ID, Emp_Name, Emp_Zipcode}, {Emp_ID, Emp_Name, Emp_Zipcode, Emp_State}, etc.

Candidate key- {Emp_ID)

All the attributes excluding {Emp_ID} are non-prime attributes.

Emp_State and Emp_City are dependent on Emp_Zipcode, which is dependent on Emp_ID. The non-prime attributes (Emp_State, Emp_City) are transitively dependent on Emp_ID (super key). Thus, it is clear that the table is not in 3NF.

To convert the table in 3NF, move the Emp_City and Emp_State to a new table and make Emp_Zipcode the primary key of the table. Now, the Employee_info table is normalized.

Table 1- Employee table

Emp_IDEmp_NameEmp_Zipcode
111John25025
112George12542
113Mary26555
114Lisa26555
115Adam58622

Table 2- Employee_Zipcode table

Emp_ZipcodeEmp_StateEmp_City
25025UPNew York
12542MPBarcelona
26555MPNorway
26555APIllinois
58622JKAmpton

Boyce-Codd normal form

Boyce-Codd normal form is an advanced version of the third normal form. For a table to be in BCNF, it should fulfill the following conditions:

  • The table should be in 3NF.
  • For every functional dependency A → B, A has to be a super key.

Example- Assume that a company wants to create a database to store the details of employees working in more than one department. The table to store the data of the employees will be as follows:

Emp_IDEmp_NationEmp_DeptDept_TypeEmp_Dept_No.
111ParisAccountR251452
111ParisDesignR234236
222RussiaSalesR438565
222RussiaManufacturingR721412

Here, functional dependencies are:
Emp_ID → Emp_Nation
Emp_Dept → {Dept_Type, Emp_Dept_No.}

Candidate Key- {Emp_ID, Emp_Dept}

Here, Emp_Dept, and EMP_ID are not the only keys. So, the table is not in BCNF. To convert the table into BCNF, decompose it as shown below.

Table 1- Employee_Nation table

Emp_IDEmp_Nation
111Paris
222Russia

Table 2- Employee_Dept table

Emp_DeptDept_TypeEmp_Dept_No.
AccountR251452
DesignR234236
SalesR438565
ManufacturingR721412

Table 3- Employee_Dept_Mapping table

Emp_IDEmp_Dept
111Account
111Design
222Sales
222Manufacturing

Functional Dependencies:
Emp_ID → Emp_Nation
Emp_Dept → {Dept_Type, Emp_Dept_No.}

Candidate Keys:
For table 1: Emp_ID
For table 2: Emp_Dept
For table 3: {Emp_ID, Emp_Dept}

Here, the left-hand side of both functional dependencies is a key. Hence, the table is in BCNF.

Context and Applications

Database administrators apply normalization methods in database management to organize complex tables and to make it easier to retrieve data from the database. Further, competitive exams ask questions related to all of the normalization forms.

  • Bachelor of Computer Science
  • Masters in Data Science
  • Master of Science in Data Analytics

Practice Problems

Q1. Select the correct answer from the following statements.

  1. Every relation in 3 normal form is also in BCNF.
  2. Every relation in BCNF is also in 3 normal form.
  3. A relation is in 3NF if each non-prime attribute of the relation is fully dependent on every key of the relation and has a z-score.
  4. A relation cannot be in BCNF and 3NF together.

Correct Option- b
Explanation: BCNF is a higher and stricter version of 3NF. This indicates that for a relationship to be in BCNF, it needs to be in 3NF.

Q2. The relational database of a Student Performance (Name, CourseNo, EnrollmentNo, Grade) has the functional dependencies mentioned below:

Name, Course No. → Grade
Enrolment No., Course No. → Grade
Name → Enrolment No.
Enrolment No. → Name

What will be the highest normalization form in this database?

  1. 2NF
  2. 3NF
  3. BCNF
  4. 1NF

Correct Option- b
Explanation: Assume that the attributes (Name, Course No., Enrolment No., Grade) are (A, B, C, D). According to the information, the functional dependencies in the relational model are:

AB → D
CB → D
A → C
C → A

So, we have two candidate keys AB and CB.

Both AB → D and CB → D satisfy BCNF, since the left side is a super key.

However, A → C and C → A are not in BCNF. Therefore, check if they satisfy 3NF.

Since C and A attributes, on RHS of both functional dependencies are prime attributes, they are in 3NF.

Thus, the highest normal form of the whole relation is 3NF.

Q3. When is a table said to be in 3NF?

  1. It is in 2NF and has differential expression.
  2. It is in 2NF and has no transitive dependencies.
  3. It is in BCNF and has differential expression technique.
  4. It is in 2 normalization forms and has a spike-in structure.

Correct option- b

Explanation: For a table to be in 3NF, it has to be in 2NF and must not be transitively dependent on a prime key attribute.

Q4. Which normal form converts composite attributes into individual attributes?

  1. First normal form
  2. Second normal form
  3. Third normal form
  4. BCNF

Correct Option- a
Explanation: In 1NF, the table is normalized to remove the duplicate information (redundant data).

Q5. Which is true for tables in 2NF?

  1. Removes the possibility of insertion anomalies.
  2. Eliminate all hidden dependencies.
  3. Contain a composite key and multi-valued attributes.
  4. Contain non-key fields dependent on primary keys.

Correct Option- b
Explanation: For a table to be in 2NF, it has to be in 1NF and should not have any partial dependency.

Common Mistakes

The Boyce-Codd normal form is often considered the fourth normal form. However, BCNF is neither the third normal form nor the fourth normal form. BCNF is stricter than 3NF. It can be considered as 3.5NF.

  • Functional dependency
  • Relational decomposition
  • Anomalies in DBMS
  • Types of database key

Want more help with your computer science homework?

We've got you covered with step-by-step solutions to millions of textbook problems, subject matter experts on standby 24/7 when you're stumped, and more.
Check out a sample computer science Q&A solution here!

*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.

Tagged in
EngineeringComputer Science

Database

Anomalies in database

Normalization

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.

Tagged in
EngineeringComputer Science

Database

Anomalies in database

Normalization