Skip to content

Data anomalies

The most fundamental reason for using normalisation is to eliminate undesirable redundancy in a data structure. Where redundancy exists, there is a risk that certain operations will create data anomalies. The term anomaly is used to describe something that is not in line with expectations. In a relational database, we expect that all of the data stored will be a) correct and b) internally consistent.

To illustrate what is meant by redundancy and data anomalies, let's imagine an example in which consultants are assigned to projects. The data about consultants and projects could be stored in a single table as shown below.

consultant(cons_id, cons_name, grade, daily_rate)

project(proj_code, proj_name, start_date, end_date)

assignment(cons_id, proj_code, days)

cons_id cons_name grade daily_rate days proj_code proj_name start_date end_date
123 McAlastair Senior 750 42 AB66 Goldfish 01-OCT-2011 30-JUN-2012
143 McBeth Executive 800 12 DD25 Silverbird 01-JAN-2012 31-MAY-2012
125 McCluskey Junior 500 80 AB66 Goldfish 01-OCT-2011 30-JUN-2012
163 McDowell Senior 750 65 GC31 Bronzecat 15-FEB-2012 15-JAN-2013
167 McEwan Executive 800 10 GC31 Bronzecat 15-FEB-2012 15-JAN-2013
123 McAlastair Senior 750 42 GC31 Bronzecat 15-FEB-2012 15-JAN-2013

From this table, we can see that two consultants are assigned to project Goldfish, one to Silverbird and three to Bronzecat. We can also see from looking at any record the details of the project that the consultant is assigned to. The data is therefore correct, complete and consistent. However, it is also clear that some data appears in the table more than once. For example, there are two sets of information for project Goldfish, one associated with McAlastair and the other with McCluskey.

Insertion anomalies

To insert the details of a new consultant into the CONS_PROJ table, the details of their project assignment must match those for other consultants on the same project. If we were to assign a new consultant to project Silverbird, for example, there is nothing to prevent us using a different project code, start date or end date. Thus there is the risk of an anomaly - that is to say, two inconsistent records containing conflicting data with no indication of which one is correct.

To insert the details of a new project into the CONS_PROJ table, we would need to use null values for the consultant data until a consultant is assigned. This is not possible since the cons_id column is the primary key.

Deletion anomalies

If McBeth leaves the company, we nee to delete the corresponding record from the table. However, since McBeth is the only consultant assigned to project Silverbird, that means deleting all the details about that project. We have then lost some data from the database, and it is no longer complete.

Modification anomalies

If the details of a project changes, we need to update the database. However, where there are several copies of project data, we need to be careful to update all of them so that the data remains consistent. If any related row is missed in the update, we are again left with inconsistent data.

Solution

The cure for all of these potential error situations is to eliminate the redundancy in the single table by splitting it up. In the tables shown below, the same data is stored, but this time each detail is stored in only one row, and consultant data and project data can be handled separately.

consultant(cons_id, cons_name, grade, daily_rate)

project(proj_code, proj_name, start_date, end_date)

assignment(cons_id, proj_code, days)

cons_id cons_name grade daily_rate
123 McAlastair Senior 750
143 McBeth Executive 800
125 McCluskey Junior 500
163 McDowell Senior 750
167 McEwan Executive 800
123 McAlastair Senior 750
proj_code proj_name start_date end_date
AB66 Goldfish 01-OCT-2011 30-JUN-2012
DD25 Silverbird 01-JAN-2012 31-MAY-2012
GC31 Bronzecat 15-FEB-2012 15-JAN-2013
cons_id proj_code days
123 AB66 42
143 DD25 12
125 AB66 80
163 GC31 65
167 GC31 10
123 GC31 42

In this case, both the problem and the solution are quite simple. You would probably have identified consultants and projects as separate entities from the beginning. However, more complicated cases arise in practice which are less obvious. It is these that the rules of normalisation can help to identify and resolve. Having said that, there is still a redundancy problem with one of the tables above. Before going on, try to identify the problem and suggest a solution.