Normalization is a method for organizing data elements in a database into tables.
Normalization Avoids
-
Duplication of Data – The same data is listed in multiple lines of the database
-
Insert Anomaly – A record about an entity cannot be inserted into the table without first inserting information about another entity – Cannot enter a customer without a sales order
-
Delete Anomaly – A record cannot be deleted without deleting a record about a related entity. Cannot delete a sales order without deleting all of the customer’s information.
-
Update Anomaly – Cannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed
Normalization is a three stage process – After the first stage, the data is said to be in first normal form, after the second, it is in second normal form, after the third, it is in third normal form
Before Normalization
1. Begin with a list of all of the fields that must appear in the database. Think of this as one big table.
2. Do not include computed fields
3. One place to begin getting this information is from a printed document used by the system.
4. Additional attributes besides those for the entities described on the document can be added to the database.
Before Normalization – Example
See Sales Order from below:
Fields in the original data table will be as follows:
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName, ItemNo, Description, Qty, UnitPrice
Think of this as the baseline – one large table
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
· Eliminate duplicative columns from the same table.
· Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
- Separate Repeating Groups into New Tables.
- Repeating Groups Fields that may be repeated several times for one document/entity
- Create a new table containing the repeating data
- The primary key of the new table (repeating group) is always a composite key; Usually document number and a field uniquely describing the repeating line, like an item number.
First Normal Form Example
The new table is as follows:
SalesOrderNo, ItemNo, Description, Qty, UnitPrice
The repeating fields will be removed from the original data table, leaving the following.
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName
These two tables are a database in first normal form
What if we did not Normalize the Database to First Normal Form?
Repetition of Data – SO Header data repeated for every line in sales order.
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data:
· Meet all the requirements of the first normal form.
· Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
· Create relationships between these new tables and their predecessors through the use of foreign keys.
- Remove Partial Dependencies.
- Functional Dependency The value of one attribute in a table is determined entirely by the value of another.
- Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).
- Create separate table with the functionally dependent data and the part of the key on which it depends. Tables created at this step will usually contain descriptions of resources.
Second Normal Form Example
The new table will contain the following fields:
ItemNo, Description
All of these fields except the primary key will be removed from the original table. The primary key will be left in the original table to allow linking of data:
SalesOrderNo, ItemNo, Qty, UnitPrice
Never treat price as dependent on item. Price may be different for different sales orders (discounts, special customers, etc.)
Along with the unchanged table below, these tables make up a database in second normal form:
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName
What if we did not Normalize the Database to Second Normal Form?
- Repetition of Data – Description would appear every time we had an order for the item
- Delete Anomalies – All information about inventory items is stored in the SalesOrderDetail table. Delete a sales order, delete the item.
- Insert Anomalies – To insert an inventory item, must insert sales order.
- Update Anomalies – To change the description, must change it on every SO.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
· Meet all the requirements of the second normal form.
· Remove columns that are not dependent upon the primary key.
- Remove transitive dependencies.
- Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
- Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.
Third Normal Form Example
The new tables would be:
CustomerNo, CustomerName, CustomerAdd
ClerkNo, ClerkName
All of these fields except the primary key will be removed from the original table. The primary key will be left in the original table to allow linking of data as follows:
SalesOrderNo, Date, CustomerNo, ClerkNo
Together with the unchanged tables below, these tables make up the database in third normal form.
ItemNo, Description
SalesOrderNo, ItemNo, Qty, UnitPrice
What if we did not Normalize the Database to Third Normal Form?
- Repetition of Data – Detail for Cust/Clerk would appear on every SO
- Delete Anomalies – Delete a sales order, delete the customer/clerk
- Insert Anomalies – To insert a customer/clerk, must insert sales order.
- Update Anomalies – To change the name/address, etc, must change it on every SO.
Completed Tables in Third Normal Form
Customers: CustomerNo, CustomerName, CustomerAdd
Clerks: ClerkNo, ClerkName
Inventory Items: ItemNo, Description
Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo
SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
If there are non-trivial dependencies between candidate key attributes, then separate them out into distinct tables
· Meet all the requirements of the third normal form.
· Every determinant must be a candidate key.
Fourth Normal Form (4NF)
Isolate Independent Multiple Relationships
No table may contain two or more 1:n or n:m relationships that are not directly related.
Finally, fourth normal form (4NF) has one additional requirement:
· Meet all the requirements of the third normal form.
· A relation is in 4NF if it has no multi-valued dependencies.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.
Definition: Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.
For more detail, read What is a Database Dependency?
Examples:
For example, imagine a car company that manufactures many models of car, but always makes both red and blue colors of each model. If you have a table that contains the model name, color and year of each car the company manufactures, there is a multivalued dependency in that table. If there is a row for a certain model name and year in blue, there must also be a similar row corresponding to the red version of that same car.
Example
Consider this example of a database of teaching courses, the books recommended for the course, and the lecturers who will be teaching the course:
Course | Book | Lecturer |
AHA | Silberschatz | John D |
AHA | Silberschatz | William M |
AHA | Silberschatz | Christian G |
AHA | Nederpelt | William M |
AHA | Nederpelt | John D |
AHA | Nederpelt | Christian G |
| | |
AHA | MYBOOK | William M |
| | |
OSO | Silberschatz | John D |
OSO | Silberschatz | William M |
| | |
Because the lecturers attached to the course and the books attached to the course are independent of each other, this database design has a multivalued dependency; if we were to add a new book to the AHA course, we would have to add one record for each of the lecturers on that course, and vice versa.
5th Normal Form
Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.
Isolate Semantically Related Multiple Relationships
There may be practical constrains on information that justify separating logically related many-to-many relationships.
Example
Consider the following example:
Travelling Salesman Product Availability By Brand |
|
Travelling Salesman | Brand | Product Type |
Jack Schneider | Acme | Vacuum Cleaner |
Jack Schneider | Acme | Breadbox |
Willy Loman | Robusto | Pruning Shears |
Willy Loman | Robusto | Vacuum Cleaner |
Willy Loman | Robusto | Breadbox |
Willy Loman | Robusto | Umbrella Stand |
Louis Ferguson | Robusto | Vacuum Cleaner |
Louis Ferguson | Robusto | Telescope |
Louis Ferguson | Acme | Vacuum Cleaner |
Louis Ferguson | Acme | Lava Lamp |
Louis Ferguson | Nimbus | Tie Rack |
The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand, are available from the travelling salesman designated by Travelling Salesman.
In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and Product Type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B makes Product Type P), the Travelling Salesman must offer products of Product Type P made by Brand B.
In that case, it is possible to split the table into three: Product Types By Travelling Salesman |
Travelling Salesman Product Type | |
Jack Schneider | Vacuum Cleaner |
Jack Schneider | Breadbox |
Willy Loman | Pruning Shears |
Willy Loman | Vacuum Cleaner |
Willy Loman | Breadbox |
Willy Loman | Umbrella Stand |
Louis Ferguson | Telescope |
Louis Ferguson | Vacuum Cleaner |
Louis Ferguson | Lava Lamp |
Louis Ferguson | Tie Rack |
Brands By Travelling Salesman |
Travelling Salesman | Brand |
Jack Schneider | Acme |
Willy Loman | Robusto |
Louis Ferguson | Robusto |
Louis Ferguson | Acme |
Louis Ferguson | Nimbus |
Product Types By Brand |
Brand | Product Type |
Acme | Vacuum Cleaner |
Acme | Breadbox |
Acme | Lava Lamp |
Robusto | Pruning Shears |
Robusto | Vacuum Cleaner |
Robusto | Breadbox |
Robusto | Umbrella Stand |
Robusto | Telescope |
Nimbus | Tie Rack |
Blogger Labels:
Database,
Normalization,
method,
data,
Avoids,
Duplication,
Insert,
information,
customer,
sales,
Delete,
Update,
Begin,
Think,
system,
Additional,
entities,
Example,
Order,
Fields,
SalesOrderNo,
Date,
CustomerNo,
CustomerName,
CustomerAdd,
ClerkNo,
ClerkName,
ItemNo,
Description,
UnitPrice,
Normal,
Form,
Eliminate,
columns,
Create,
column,
Separate,
Groups,
Tables,
item,
Normalize,
Repetition,
Header,
Second,
concept,
Meet,
requirements,
Remove,
subsets,
relationships,
predecessors,
Partial,
Dependencies,
Functional,
Dependency,
descriptions,
resources,
Price,
customers,
Along,
Anomalies,
inventory,
items,
SalesOrderDetail,
Third,
Transitive,
Thus,
agents,
Keep,
Together,
Detail,
Cust,
Clerk,
Clerks,
Orders,
Boyce,
Codd,
BCNF,
requirement,
candidate,
determinant,
Fourth,
Isolate,
Independent,
Multiple,
relation,
Remember,
guidelines,
criteria,
Definition,
presence,
Examples,
version,
Consider,
lecturers,
Course,
Book,
Lecturer,
Silberschatz,
John,
William,
Christian,
Nederpelt,
MYBOOK,
vice,
Fifth,
Project,
redundancy,
databases,
Salesman,
Product,
Brand,
Type,
Jack,
Schneider,
Acme,
Vacuum,
Cleaner,
Breadbox,
Loman,
Robusto,
Shears,
Umbrella,
Stand,
Louis,
Ferguson,
Telescope,
Lava,
Lamp,
Nimbus,
Rack,
Products,
absence,
combinations,
situation,
Suppose,
Brands,
Types,
repertoire,
three,
duplicative,
multi