Skip to content

Statistics

Mathematical methods are often perceived as difficult, and that is especially true for statistics. However, just as it is possible to use a software application without knowing how it is constructed, using a statistical method can be relatively straightforward. The important things are knowing

  1. What you are trying to achieve, and
  2. Which statistical test will do the job for you

A statistic is a numerical value which conveys some information about a group of items. If the group includes all possible items, it is called a population, and if it is a subset of all possible items, it is known as a sample.

The rest of the pages in this section will describe some common statistical tests, and how to implement them using Excel.

Overview

When you take a single measurement, it is subject to a certain amount of natural variation. Sources of variation can include

  • The accuracy of the measurement equipment
  • Difference between test subjects
  • Environmental difference from one measurement to the next
  • etc.

Statistics starts from the premise that there is a "true" value, and that a set of actual measurements will cluster around the true value. The ways in which the measurements arrange themselves around the true value is known as a distribution. Many naturally-occurring distributions have been observed, but one of the simplest and most useful is the normal distribution which has a distinctive symmetrical appearance as shown in Fig. 1.

Normal distribution

Figure 1: Normal distribution - the vertical blue line indicates the mean

In the figure, the vertical line in the centre of the distribution represents the notional true value. Most of the measured values are close to the tru value - hence the high peak - and there are fewer measurements the further away from the true value they are.

Because of its characteristics - including symmetry - the normal distribution is defined by two commonly-understood statistics: the mean is the value indicated by the central vertical in the diagram and approximates the true value. The standard deviation represents the horizontal spread of the distribution. A large value of the standard deviation indicates a short fat version of the normal distribution, and a small value represents a tall thin one. You can see the effect of different values of the mean and standard deviation using this demo from Wolfram Mathematica.

For simple purposes, this is all you need to know about the normal distribution. However, there is a lot more to know if you want to get into the details. You can find out more here. An important point is that for many purposes, we can assume that our data is normally distributed without any great loss of accuracy.