Performance

Another aspect of the DBA's responsibilities is to maintain the performance of the database. Although the structural design is determined by a systematic and logical design process, practical issue may arise later that prevent it from responding as quickly as it should. The most likely reason for a degradation in performance over time is the accumulation of large amounts of data. Financial institutions, stock trading platforms and e-commerce firms all process thousands of transactions a day, and the volume of data can quickly escalate. Ideally, these issues should be addressed at the design stage, but occasionally the problem only surfaces later.

An obvious approach to resolving performance issues is to invest in more powerful hardware. This can be expensive however, and significant time will be necessary to install the new system, configure it appropriately, install the required software, configure the DBMS, migrate the data from the old system to the new system, and parallel run with the old system until any remaining issues have been identified and resolved. It is not cheap, not quick, and may even fail to solve the problem. Until the precise cause of the performance drop is identified, putting money into new hardware is just guessing.

Archiving

Another obvious route for the DBA to take is to reduce the amount of data in the database. It is good practice to archive old data on a regular basis in any case, and there may be further advantages to doing so. For example, the company may wish to analyse database transactions over a long period of time. This will often provide useful strategic information to business managers; however, such analyses consume a lot of computing resource and are not normally possible while the system is in operation. Old data is therefore often extracted and loaded into a second database known as a data warehouse. Apart from the occasions when new data is added, the contents of the data warehouse is fixed. This differentiates it from the operational database which is known as a online transaction processing (OLTP) system. The structure of the data warehouse, or online analytical process (OLAP) system, can therefore be optimised for fast data retrieval rather than to accommodate the storage and updating of records.

Notice that data is rarely ever completely deleted from a database system. Even if the old data is not required for loading into a data warehouse, it will typically be maintained in offline storage of some kind for a particular length of time. For certain types of data such as financial accounts, electoral results and records of company formation there are legal requirements on how long data must be maintained. This paper by Watson Hall information security specialists neatly summarises the UK legal regulations on data retention.

For practical reasons, it may not always be desirable to remove old data from the online system. In a system where existing records can act as references for new transactions such as a helpdesk or medical records database, removing data may reduce the overall effectiveness of operation.

Indexes

As mentioned in week 7, the speed of a query can sometimes be improved by adding and index. When considering this option, the DBA needs to identify a query that is too slow and examine the WHERE clause to determine whether an index would be of use. Oracle and several other database platforms automatically create indexes for primary and unique keys, but other indexes must be added explicitly. If the WHERE clause of the query uses a non-key column to filter the table contents, an index on that column could help.

For example, in the hospital example from earlier, the patient_id column is the primary key. A query which filters patient records by last_admission_date however may begin to run slowly once a large number of records accumulates. This issue may have been identified and addressed at design time, but if it was not, a full table scan will be required every time the query is run. Creating an index on last_admission_date would therefore improve performance.

Designing an appropriate index is not always that simple, since several columns can be included in a single index and success may depend on getting the right combination; however, further detail in this area is beyond the scope of the module.

In the simple case discussed here, the index could be created using the following SQL statement:

1	`CREATE INDEX p_index1 ON patient(last_admission_date);`

patient
patient_id
first_name
last_name
gender
date_of_birth
last_admission_date
assigned_consultant
date_of_death