# Statistics

Mathematical methods are often perceived as difficult, and that is especially true for statistics. However, just as it is possible to use a software application without knowing how it is constructed, using a statistical method can be relatively straightforward. The important things are knowing

- What you are trying to achieve, and
- Which statistical test will do the job for you

A *statistic* is a numerical value which conveys some information about a group of items. If the
group includes all possible items, it is called a *population*, and if it is a subset of all possible
items, it is known as a *sample*.

The rest of the pages in this section will describe some common statistical tests, and how to implement them using Excel.

## Overview

When you take a single measurement, it is subject to a certain amount of natural variation. Sources of variation can include

- The accuracy of the measurement equipment
- Difference between test subjects
- Environmental difference from one measurement to the next
- etc.

Statistics starts from the premise that there is a "true" value, and that a set of
actual measurements will cluster around the true value. The ways in which the measurements
arrange themselves around the true value is known as a *distribution*. Many naturally-occurring
distributions have been observed, but one of the simplest and most useful is the *normal distribution*
which has a distinctive symmetrical appearance as shown below.

In the diagram, the vertical line in the centre of the distibution represents the notional true value. Most of the measured values are close to the tru value - hence the high peak - and there are fewer measurements the further away from the true value they are.

Because of its characteristics - including symmetry - the normal distribution is defined by two
commonly-understood statistics: the *mean* is the value indicated by the central vertical in
the diagram and approximates the true value. The *standard deviation* represents the horizontal
spread of the distribution. A large value of the standard deviation indicates a short fat version
of the normal distribution, and a small value represents a tall thin one. You can see the effect
of different values of the mean and standard deviation using this demo
from Wolfram Mathematica.

For simple purposes, this is all you need to know about the normal distribution. However, there is a lot more to know if you want to get into the details. You can find out more here. An important point is that for many purposes, we can assume that our data is normally distributed without any great loss of accuracy.