# Statistical tests

Statistical tests evaluate the *probability* that there is a large enough deviation from
expected observations for it to be considered *significant*. Probability is expressed as a
number from 0 to 1 where 0 corresponds to impossibility and 1 corresponds to certainty.

All statistical tests involve the calculation of a characteristic number or *statistic* with
a known distribution. The simplest to work with is the normal distribution, and
most tests assume that the data is normally distributed. If the data deviates
from normal by a large amount, the test is no longer valid. With the normal distribution,
most observed values will be close to the mean value. Because of natural variation, however,
there will always be some extremely small or extremely large observed values.
The extreme observations are found in the *tails* of the distribution as shown below.

One tail | Two tails |
---|---|

One tail | Two tails |

In the diagram, the shaded area of the curve in the left-hand panel corresponds to the top 5% of
values. That is, 95% of observed values are expected to be below the threshold value. To observe
a value in the shaded area is rare, and is therefore more likely to be the result of a real
difference rather than natural variation. A statistical test reports the probability of the
calculated statistic occurring. If is below the *critical value* that corresponds to the 95%
threshold, we say that we are *95% confident" that the observed difference is real, or more
precisely, the result is *significant at the 95% level*. For a greater degree of confidence,
a 98% threshold is sometimes applied.

In the right-hand panel of the diagram, the shaded area is split between the two tails of the
distribution, but still corresponds to the extreme 5% of expected values. When performing a test,
you often have the option to perform a *one-tailed* or *two-tailed* test. A one-tailed test is
appropriate if the direction of the difference is already known beforehand, and a two-tailed
test is used if the difference could be in either direction.

## Hypothesis testing

It is important to recognise that statistical tests are designed to identify results with
*low probability* - that is, those that differ from the expectation. Because of this, the procedure
is to assume that all differences are the result of natural variation, *unless* a test returns
a statistic with an extremely low probability. In statistical terminology, the expected case where
only natural variation plays a part is called the *null hypothesis*. The *alternative hypothesis*
describes the low-probability situation where there is a mathematically significant result. For
example, If we wanted to find out whether there is a significant difference in the number of
social media contacts between make and female students, the null hypothesis would be that there is
no significant difference, and the alternative hypothesis is the difference is significant. If our
test returns a result below the critical value at the 95% level, we say that the ** null hypothesis
is rejected** and that the

**.**

*alternative hypothesis is accepted*## Variables

Any measured quantity is referred to as a *variable* in statistics. This could be a physical
measurement like length, or an attitudinal concept such as dedication. Some tests are designed
to evaluate the relationship between variables. Usually, this involves exploring whether one
variable can be used to predict another. In such a case, the known quantity is called the
*independent variable*, and the predicted quantity is called it *dependent variable*.

When setting up an experiment to measure the two quantities, it is important to exclude
unwanted influences on the situation under test. For example, if you were trying to
ascertain whether volume of network traffic could be used as a reliable predictor of the
number of cyberattacks, it would be important to choose times when network traffic is not
artificially raised such as immediately after a new software release or a noteworthy press
release on the company website. These unwanted influences are known as *confounding variables*
and need to be taken into account in the experimental design.

## Test conditions

A common situation in an experimental study is to set up two contrasting situations to find out
whether there is a significant difference between the two. For example, the purpose of the study
might be to compare the usability of different search engines. To do that, you would need to gather a
group of test subjects and ask them to carry out a prescribed set of search activities on each search
engine. In experimental/statistical terminology, the test subjects are exposed to a set of different
*test conditions*. In this example, a condition is defined by the search engine being used.

## Degrees of freedom

Many tests are sensitive to the amount of data that is provided. The concept of *degrees of freedom*
captures the idea that the more points there are in your dataset, the more possible combinations of
values there are that need to be considered. Imagine a dataset of just one value: there is no
variance because there is only a single fixed value. As soon as a second point is added, however,
the difference between the two values is a source of variance. A third point introduces even more
variance. Usually, degrees of freedom is equal to the number of data values minus one.

## Choosing the right test

If you have a degree in applied mathematics, you will be able to select the test that is appropriate for the situation based on your theoretical knowledge. You might also carry out the test by implementing code based on the mathematical formula. Fortunately, there are several tools available for performing statistical tests that do not require deep mathematical knowledge. However, it is still important to know which test to use. The decision tree below helps with this by taking you through a series of simple questions.

## Types of error

There are two major things that go wrong with statistical tests. Bearing in mind that statistical
tests generally focus on accepting or rejecting the null hypothesis, the first type of error is
when the null hypothesis is true but the test suggests it should be rejected. This is sometimes
known as a *false positive* error, because when the null hypothesis is rejected, it results in
accepting the alternative hypothesis. The second type of error is where the null hypothesis is
actually false, but the test result suggests it should be accepted. This may be referred to as a
*false negative*.