Data anomalies
The most fundamental reason for using normalisation is to eliminate undesirable redundancy in a data structure. Where redundancy exists, there is a risk that certain operations will create data anomalies. The term anomaly is used to describe something that is not in line with expectations. In a relational database, we expect that all of the data stored will be a) correct and b) internally consistent.
To illustrate what is meant by redundancy and data anomalies, let's imagine an example in which consultants are assigned to projects. The data about consultants and projects could be stored in a single table as shown below.
consultant(cons_id, cons_name, grade, daily_rate)
project(proj_code, proj_name, start_date, end_date)
assignment(cons_id, proj_code, days)
cons_id | cons_name | grade | daily_rate | days | proj_code | proj_name | start_date | end_date |
---|---|---|---|---|---|---|---|---|
123 | McAlastair | Senior | 750 | 42 | AB66 | Goldfish | 01-OCT-2011 | 30-JUN-2012 |
143 | McBeth | Executive | 800 | 12 | DD25 | Silverbird | 01-JAN-2012 | 31-MAY-2012 |
125 | McCluskey | Junior | 500 | 80 | AB66 | Goldfish | 01-OCT-2011 | 30-JUN-2012 |
163 | McDowell | Senior | 750 | 65 | GC31 | Bronzecat | 15-FEB-2012 | 15-JAN-2013 |
167 | McEwan | Executive | 800 | 10 | GC31 | Bronzecat | 15-FEB-2012 | 15-JAN-2013 |
123 | McAlastair | Senior | 750 | 42 | GC31 | Bronzecat 15-FEB-2012 | 15-JAN-2013 |
From this table, we can see that two consultants are assigned to project Goldfish, one to Silverbird and three to Bronzecat. We can also see from looking at any record the details of the project that the consultant is assigned to. The data is therefore correct, complete and consistent. However, it is also clear that some data appears in the table more than once. For example, there are two sets of information for project Goldfish, one associated with McAlastair and the other with McCluskey.
Insertion anomalies
To insert the details of a new consultant into the CONS_PROJ table, the details of their project assignment must match those for other consultants on the same project. If we were to assign a new consultant to project Silverbird, for example, there is nothing to prevent us using a different project code, start date or end date. Thus there is the risk of an anomaly - that is to say, two inconsistent records containing conflicting data with no indication of which one is correct.
To insert the details of a new project into the CONS_PROJ table, we would need to use null values for the consultant data until a consultant is assigned. This is not possible since the cons_id column is the primary key.
Deletion anomalies
If McBeth leaves the company, we nee to delete the corresponding record from the table. However, since McBeth is the only consultant assigned to project Silverbird, that means deleting all the details about that project. We have then lost some data from the database, and it is no longer complete.
Modification anomalies
If the details of a project changes, we need to update the database. However, where there are several copies of project data, we need to be careful to update all of them so that the data remains consistent. If any related row is missed in the update, we are again left with inconsistent data.
Solution
The cure for all of these potential error situations is to eliminate the redundancy in the single table by splitting it up. In the tables shown below, the same data is stored, but this time each detail is stored in only one row, and consultant data and project data can be handled separately.
consultant(cons_id, cons_name, grade, daily_rate)
project(proj_code, proj_name, start_date, end_date)
assignment(cons_id, proj_code, days)
cons_id | cons_name | grade | daily_rate |
---|---|---|---|
123 | McAlastair | Senior | 750 |
143 | McBeth | Executive | 800 |
125 | McCluskey | Junior | 500 |
163 | McDowell | Senior | 750 |
167 | McEwan | Executive | 800 |
123 | McAlastair | Senior | 750 |
proj_code | proj_name | start_date | end_date |
---|---|---|---|
AB66 | Goldfish | 01-OCT-2011 | 30-JUN-2012 |
DD25 | Silverbird | 01-JAN-2012 | 31-MAY-2012 |
GC31 | Bronzecat | 15-FEB-2012 | 15-JAN-2013 |
cons_id | proj_code | days |
---|---|---|
123 | AB66 | 42 |
143 | DD25 | 12 |
125 | AB66 | 80 |
163 | GC31 | 65 |
167 | GC31 | 10 |
123 | GC31 | 42 |
In this case, both the problem and the solution are quite simple. You would probably have identified consultants and projects as separate entities from the beginning. However, more complicated cases arise in practice which are less obvious. It is these that the rules of normalisation can help to identify and resolve. Having said that, there is still a redundancy problem with one of the tables above. Before going on, try to identify the problem and suggest a solution.