Data redundancy
In non-database systems each application has its own private files. This can often lead to multiple copies of the same data stored in different places. If one copy is changed, the overall picture becomes inconsistent. Even if there are comprehensive checking procedures in place to ensure that all data stores are updated at the same time, this entails a large management overhead.
Thus data redundancy is inefficient and can lead to inconsistencies in the data (loss of data integrity)
The database may be thought of as a unification of several otherwise distinct data files, with any redundancy among those files partially or wholly eliminated. Some of the techniques that will be covered in this module are specifically to do with taking a structure that contains redundancy and resolving them through the application of procedural rules.
Redundancy is
- direct if a value is a copy of another
- indirect if the value can be derived from other values
Data integration is generally regarded as an important characteristic of a database. The rule of thumb is that a single piece of data should only be stored once, and if it can be derived from other data, it should not be stored at all. However, there are some cases where this guideline may be broken to speed up a particular database operation, or simplify a user view. Identifying the need for such exceptions is an advanced topic, however, and requires a detailed understanding of database technology.