Data integrity
At the conceptual level, a database provides a model of the real world using four simple components:
Component | Description | Examples |
---|---|---|
Entities | The important things in the real world that need to be modelled | People, places, objects, events, etc. |
Attributes | Individual items of data associated with an entity | Name, national insurance number, weight, date of manufacture |
Relationships | Ways in which entities are connected | A is part of B, A lives in B, A produces B, A takes place in B, etc. |
Constraints | Rules which place limits on the data that is allowed | Every A must have a B, Only future dates are allowed, etc. |
Relationships and constraints form a set of rules to which data must conform. data integrity is the validity and consistency of the database contents with respect to these rules. Data integrity must be maintained, otherwise meaningless data starts to accumulate. For example, if a database contains two entries for the same entity, it would be possible to modify one copy and leave the other as it is. Afterwards there is no way of knowing which copy is "correct", and we have lost some data integrity. The solution would be to enforce a rule that says only one entry is allowed for a single entity.
Integrity checks on data items can be divided into 4 groups:
1. Type checks
- e.g. ensuring a numeric field is numeric and not a character - this check should be performed automatically by the DBMS.
2. Redundancy checks
- direct or indirect (see data redundancy) - this check is not automatic in most cases and must be added by the database designer
3. Range checks
- e.g. to ensure a data item value falls within a specified range of values, such as checking dates so that say (age > 0 AND age < 110).
4. Comparison checks
- in this check a function of a set of data item values is compared against a function of another set of data item values. For example, the max salary for a given set of employees must be less than the min salary for the set of employees on a higher salary scale.
A record type may have constraints on the total number of occurrences, or on the insertions and deletions of records. For example in a patient database there may be a limit on the number of x-ray results for each patient or the details of a patients visit to hospital must be kept for a minimum of 5 years before it can be deleted.
Centralised control of the database helps maintain integrity, and permits the DBA to define validation procedures to be carried out whenever any update operation is attempted (update covers modification, creation and deletion of data).