Backups
Before thinking about how and when to back up the database, it is useful to consider why the database needs to be backed up. When faced with a situation in which the entire database has been rendered unavailable, such as a major act of sabotage or terrorism, or more probably a natural disaster or hardware failure, there is no alternative but to restore the database from a backup. The critical process is database recovery rather than taking the backup in the first place. The backup strategy should therefore be determined primarily by the process that you plan to follow to bring the database back online after a major failure. In order to be confident that the recovery process will be successful, it is not sufficient just to back up the database regularly, although that is a major element of the overall disaster recovery strategy. It is also important to test the recovery plan from time to time under as realistic a set of circumstances as possible.
Ideally, every database transaction would be applied to parallel systems so that if one system fails, the other is still operational. Relational database products are sufficiently advanced to do this; however, there are significant barriers. Parallel running is a very expensive option. Database suppliers realise that parallel operation brings benefits in terms of data security, and the licence costs are therefore typically double. In addition, there is a resource overhead to making multiple updates. On busy systems, a small delay associated with each transaction may be sufficient to reduce the overall performance of the whole system. This might mean further costs for more powerful hardware. Finally, even having two parallel systems does not guard against the possibility of both systems failing at the same time, or the second system failing soon after the first which amounts to the same thing. In this case, a backup strategy is still required.
In practice very few organisations can afford to maintain parallel systems just to protect their data, and most rely on carefully maintained backup and recovery procedures for the security of the data.
Types of backup
The simplest and most obvious type of backup is the full database backup. The complete contents of the database is extracted, compressed and stored in an offline file which can be placed in secure storage until it is needed for recovery. Although this sounds like it should be sufficient protection, there are some serious practical issues to take into consideration.
A full database backup can take a long time, and in order to maintain the integrity of the data while the backup is taking place, it is common to prevent users from making any transactions. This can be a major inconvenience for 24-hour operations like e-commerce companies where there is a strong business pressure to minimise the time that the system is unavailable (downtime).
A further disadvantage of full backups is their size. Because each one contains the entire contents of the database, holding several of them can take up a significant amount of storage space, and they are difficult and time-consuming to move from one location to another. The reason for keeping several backups at a time is discussed below.
A major principle of disaster recovery is that backup files must not be stored close to the system itself in case they are also affected by the same disaster (eg flood, fire etc.). Many companies employ storage specialists to store backup files in secure locations on their behalf. If the files are large, this may increase the cost of using these facilities.
A complementary type of backup is the incremental backup. Whereas the full backup contains all system configuration information user accounts, database objects and data, the incremental backup only contains the contents of the transaction log (introduced in week 9). That amounts to saving only the changes that have been applied to the database since the last bacup. The transaction log is flushed after each backup, so as long as there is one recent full backup, further incremental backups can be taken which contain only the more recent changes, and which are significantly smaller and easier to handle. If the transaction log is implemented as an external file, the time taken to make an incremental backup could be reduced to the time taken to copy it to another location.
System recovery
When recovering a database from a series of backups, the first step would be to recreate the database from the most recent full backup. That would effectively reset the database to its state on the date the backup was taken. If one or more incremental backups are also available, their changes are applied in chronological order so that data integrity is maintained. Recall that the reason that transactions are durable is that once changes are committed, further changes may build on them. It therefore becomes impossible to identify and reverse an earlier change. This is also true during database recovery.
One factor to consider in a disaster recovery strategy is the length of time that might be required to recover the database. Restroing from a single full backup would take the least amount of time; however, the time required to take the bakup in the first place might also be significant. For a 24-hour operation, for example, taking the database down once a day to perform a full backup may not be feasible - the lost business might be prohibitive. In that case a minimum number of full backups and a series of incremental backups might have been chosen. The decision on which strategy to choose come down to weighing up the conflicting factors.
Even after restoring a full backup and a series of incremental backups, some transaction may still be lost; however, if the current transaction log is still available, it can be used to recover the most recent database changes - ie those made since the most recent backup (full or incremental). If used correctly, this ensures that in the event of a full system failure, the only data that is lost consists of those transactions that were actually in progress at the time, and which had not yet been committed to the database.
Backup strategies
The disadvantage of using an incremental backup is that it extends the time required to perform a database recovery. If we have a full backup from day 1, for example, and incremental backups from the next two days and there is a system failure on the third day, there are three steps required to recover the system. First, the system must be recovered using the full backup, and then the incremental backups must both be applied in the correct order.
A further consideration is that backup files themselves may suffer damage and become unusable. It is therefore common practice to maintain a series of backups so that it is possible to recover the system from the most recent usable backup file.
Planning a backup strategy is therefore a question of balancing the advantages and disadvantages of the different types of backup against the company's priorities. Considerations can include
- the work schedule of the business
- the degree to which the company depends on database availability
- the cost of third-party services
- the time required to recover from a disaster
- etc.
A range of solutions is possible, and the examples below illustrate some of the possibilities.
Example 1: Mission-critical 24-hour operation
The priority in this case is to minimise downtime as much as possible. Given sufficient budget, the system could use parallel databases, but a last-resort backup strategy would still be required. Because a full backup is time-consuming and the risk of needing to perform a full system recovery would be significantly reduced through the use of parallel databases, the time between full backups could be quite long. The assumption is that any disaster large enough to cause both database servers to fail is likely to affect the company in other ways too, and a general return to business will therefore take a long time anyway. The time required to restore the system under these circumstances would therefore not be an issue.
We might therefore decide in this case to perform a full backup once a month at the least busy time, and to take incremental backups once a day in the intervening period. Although a full system recovery might be very time consuming with many incremental steps, we assume that this would be acceptable in exceptional circumstances. An external contractor will be used to store backup files in a secure location.
Example 2: University
Although the database systems are central to the administration of the university, the core business does not depend on them, and extended periods of downtime are therefore acceptable. Backup files could be stored with an external contractor, but given the need to keep costs as low as possible, it might be better to reach a cooperative arrangement with another university so that the backup files for one partner are stored on the premises of the other. The assumption is that most disasters will only affect one site, and that any event which compromises both sites is likely to have far wider impact that just the administrative databases. The risk to the backup files is therefore deemed acceptable. Whether an external contractor or a partner university undertakes to store the backup files, their size and volume may still be an issue, and therefore a balance needs to be found between full and incremental backups.
In this case, we might decide to take a full backup once a week at the weekend, and an incremental backup every day in the intervening period. Two sets of backup files could be maintained for extra resilience covering the most recent two-week period.
Example 3: Small business
Assuming that the business depends on the database systems only for administration, and that the staff are capable of keeping alternative records if necessary, the time needed to perform backup and recovery operations is not a major issue. In this case the database is also likely to be quite small, and a full backup is unlikely to take very long. It is still good practice to store at least one recent backup in a secure location, so a minimal contract with a storage provider would be a good idea.
Here we might decide to take a full backup after the close of business each evening. The backup from the last working day of the week could be entrusted to a secure storage specialist to provide a recovery route in case all other backups are damaged. The maximum data that could be lost in this situation would be one week's worth of transactions, but given the option to re-enter that data, we judge that the costs saved on the minimal backup strategy outweigh the risk.