Biostatistics with R

Error, Uncertainity, Accuracy and Precision in data analysis

Error and Uncertainity

In scientific measurements, an error is the difference between an observed (or measured) value and the true value of a quantity. However, in most of the experiments, the true value of a measruement is never available to us. We have to get the true value from previous reliable measurements or from some theoretical cosiderations and computations.

For example, a high school student performs an experiment to measure the density of pure water. She fills water in many vessels of known volume, measures their masses in a balance and computes an avergae density of water to be \(\small{0.98~ g/cm^3 }\). Now, this measured density has to be compared with the true density of water for computing the measurement error. The student takes a true density of water to be \(\small{1.00 g/cm^3 }\) (within 2 digits) from a data table which is based on many many measurements conducted by different labs in different countries over decades. From this the student estimates the error in her measurment to be \(\small{1.00 - 0.98~=~0.02~g/cm^3}\).

Uncertainity is an interval within which the repeated measurements of a quantity will fall.
For example, a \(\small{90\% }\) confidence interval around the sample mean can be treated as an uncertainity on the measurement. In another experiment, a \(\small{68\% }\) confidence interval may be treated as an uncertainity on the measured mean value.

Error and Uncertainity are not same. Error is a deviation from a known value wheras an uncertainity is an interval that quantifies the extent to which we can belive in the measured value.
In most of the experiments, the error is not known because of the lack of information on the correct value. Therefore, the uncertainity is generally quoted as a reliability measure of our result.
The uncertainity is generally written symbolically as, \(\small{\overline{x} \pm a }\) around the measured mean \(\small{\overline{x} }\). For example, \(\small{32.1 \pm 2.9 }\) etc.

"Mistakes" are not "legitimate errors" :

While performing an experiment or data analysis, we sometimes commit mistakes that can be called blunders. We may use a wrong calibrarion value, wrong experimental procedure, a wrong formula etc. These mistakes ("blunders") generally lead to unacceptable results and should be corrected before repeating the experments or calculations properly again. They do not constitute the "legitimate errors" which we will discuss as a part of error analysis.

Systematic versus Random errors :

Systematic errors are reproducible errors that result from things like faulty calibrarion or a shift in the starting point etc. This error introduces a 'bias' in the data which can be carefully analysed and corrected for. Consider the measurement of our weight in a balance. If we do not set the starting point properly to zero before climbing on the balance, an offset will be added to our measured weight during every measurement that follows. But once we estimate this offset, it can be subtracted from every reding to get correct value or can be included in the error analysis. Since the same systematic error occurs in every observation, it cannot be reduced by increasing the number of repeat observations.

Random errors are due to random fluctuations in the observed data value when we make repeated observations. Their origin is statistical in nature and they can be reduced by increasing the number of repeat observations.

Accuracy and Precision :

The accuracy of a measurement represents the closeness of the result to its true value. Accuracy is governed by the systematic errors and is a measure of the "correctness" of the result with respect to the reference value.

The precision of the measurement is a measure of how exeactly the result is determined, no matter how much is the measurement is. The precision is governed by the random errors and hence is a measure of the reproducibility of a result in repeated measurements.

While considering the uncertainity in an experiment, we should consider the accuracy and precision together. As we increase the number of repeat observations, the precision improves. But once the value of precision reaches the level of accuracy, further increase in measurements will not reduce the uncertanity, since it is now dominated by the accuracy. Similarly, an experiment may be very very precise, but still the measurement can have considerable uncertainity due to low precision arising out of statistical fluctuations.

The number of significant figures in a number represents the precision :

The precision of an experimental result can be represented by the way it is numerically written. The precision is given by the number of significant figures in a numerical value. We say that "the result is precise to this many significant figures". The number of significant figures in a number is determined by a set of rules which are listed in many books and articles on numerican methods. For example, see here for a complete set of rules.

A simple way to get the number of significant figures in a number is to write it down in scientific notation and count the meaningful digits. See the examples below:

The number \(\small{10 = 1.0 \times 10^1~~~ }\) has 2 significant figures.
The number \(\small{100 = 1.00 \times 10^2~~~ }\) has 3 significnt figures.
The number \(\small{1000 = 1.000 \times 10^3~~~ }\) has 4 significant figures
The number \(\small{2052 = 2.052 \times 10^3~~~ }\) has 4 significant figures
The number \(\small{101.00 = 1.0100 \times 10^2~~~ }\) has 6 significant figures
The number \(\small{0.00001055 = 1.055 \times 10^{-5}~~~ }\) has 4 significant digits