Tuesday, August 26, 2014

What is Normalization in SQL ?

Defination : Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
Benefits :
  1. Eliminate data redundancy
  2. Improve performance
  3. Query optimization
  4. Faster update due to less number of columns in one table
  5. Index improvement

There are diff. - diff. types of Normalizations form available in the Database. Lets see one by one.
1. First Normal Form (1NF)
 First normal form (1NF) sets the very basic rules for an organized database:
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
    • Remove repetative groups
    • Create Primary Key
2. Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:
·         Meet all the requirements of the first normal form.
·         Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
·         Create relationships between these new tables and their predecessors through the use of foreign keys.   
Remove columns which create duplicate data in a table and related a new table with Primary Key – Foreign Key relationship

3. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
·         Meet all the requirements of the second normal form.
·         Remove columns that are not dependent upon the primary key.
  Country can be derived from State also… so removing country

4. Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
·         Meet all the requirements of the third normal form.
·         A relation is in 4NF if it has no multi-valued dependencies.

If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it

The 4NF also known as BCNF NF
================================================================================
Normalization:-Normalization is a mechanism of decomposing tables to remove redundancy of data and removal of other characteristics like insertion, deletion and update anomalies.
To understand Insertion, deletion and update anomalies let’s take an example of “EmpDetail” table.

Eid
Name
Visa Type
Address
101
Vikas
H1
Faridabad
101
Vikas
B1
Faridabad
102
Ramesh
Null
Amritsar
103
Prakash
L1
Raipur

In the above table you can see that we have employee details along with their Visa details. Employee Ramesh has no Visa detail as Visa Type column has null value in it.

Insertion Anomalies: – If we want to insert an employee details which has no Visa that leads insertion anomalies as Visa Type value would be null.

Deletion Anomalies: - If employee Prakash’s Visa got expired and we want to delete that Visa details for Prakash, whole data will be deleted that leads deletion anomalies.

Update Anomalies:- If we want to update data for Eid =101 , we need to make sure we are updating both rows otherwise data would be inconsistent.

1st Normal Form: – First normal form says that there should not be any repeating group in a row and First Normal form also advocates about the uniqueness of each row in a table i.e. primary key should be there in each table.

Lets take an example to understand 1st Normal form.
Eid
Name
Address1
Address2
101
Vikas
Faridabad
Gurgaon
102
Ramesh
Jaipur
Null
103
Prakash
Noida
Null
In above Emloyee table, we have repeating group i.e. Address1 and Address2 which is violating first normal form. To be in first normal form we can split employee table in two tables as mentioned below.
  
Employee
AddressDetail
Eid
Name
Eid
Address
101
Vikas
101
Faridabad
102
Ramesh
101
Gurgaon
103
Prakash
102
Jaipur

We have decomposed employee table in two tables, Employee and AddressDetail, in employee table Eid is primary key and in AddressDetail table Eid and Address both can be the part of primary key . If we want to get the employee detail and address detail we have to join the both tables using joins.

2nd Normal Form:- For a table to be in 2nd Normal form it should follow the 1st Normal form and each Non Key columns should be fully dependent on the primary key or in other words there should not be any partial dependency. To understand this, let’s take an example.

Eid
ProjectID
ProjectName
EmpName
101
P01
Syntal
Vikas
102
P02
Cris
Priya
101
P03
Vintex
Vikas
103
P04
Alex
Rahul
104
P05
Dwan
Shivam

The above table has mapping of Project and employee and combination of projectID and Eid is the primary key of the table. Projectname is dependent on Project ID not the complete primary key similarly, EmpName is directly dependent on Eid not the whole primary key hence this is the violation of 2nd Normal form. To be in 2nd Normal form we need to split this table in two tables as shown below.
EmpDetail
ProjectDetail
Eid
EmpName
ProjectID
ProjectName
101
Vikas
P01
Syntal
102
Priya
P02
Cris
101
Vikas
P03
Vintex
103
Rahul
P04
Alex
104
Shivam
P05
Dwan

I have split the table in two tables, one is EmpDetail and other one is ProjectDetail where in Eid is the primary key of Empdetail and ProjectID is the primary key for ProjectDetail table. And non key attributes are fully dependent in each table, now we can say that tables are in 2ndNormal form.

3rd Normal Form:- To be in 3rd Normal form table should follow first two normal forms apart from that there should not be any transitive relationship in columns. Or in other words each column should directly dependent on primary key. E.g. if there are three columns in a table (A,B,C) and A is the primary key of the table and B is dependent on A and C is dependent on B.
AßBßC

Lets take an example to understand this fact.
EmpDetail
Eid
Basic_Sal
HRA(50% of Basic)
101
4000
2000
102
6000
3000
103
5000
2500
104
9000
4500
105
4600
2300
In the above table Eid is the primary key and Basic Salary is non key column which is dependent on Primary key and HRA is also a non key column which is not directly dependent on Primary key(Eid) rather it is dependent on other non key column Basic_Sal ,as HRA value is 50% of Basic_Sal. In this way this table violating 3rd normal form. To be in 3rd Normal form we will spilt above table in two tables as shown below.

BasicSalaryDetail
HRADetail
Eid
Basic_Sal
Eid
HRA
101
4000
101
4000
102
6000
102
6000
103
5000
103
5000
104
9000
104
9000
105
4600
105
4600

We have split the above table in two tables, BasicSalaryDetail and HRADetail, now these two tables are following all three normal forms. Lets see how.
  • Both tables do not contain any repeating group and each row in both table can be uniquely identified by primary key(Eid) , in that way 1st normal form’s condition is satisfied.
  • In both tables non key columns (Basic_Sal and HRA) fully dependent on Primary key in that way 2nd normal form’s condition is satisfied.
  • We have already removed the transitive relationship, hence both tables are in 3rd normal form.