Biostatistics with R

The weighted mean and its uncertainity

Suppose we sample n data points \(\small{x_1,x_2,x_3,....,x_n }\) from a population that has Gaussian distribution of mean mean \(\small{\mu}\) and standard deviation \(\small{\sigma }\).

The expressions for \(\small{\overline{x} }\), the most probable estimates of \(\small{\mu}\) and the uncertainity \(\small{\sigma_{\overline{x}} }\) are given in terms of the observed data points by the expressions,

\(~~~~~~~~~~\small{\overline{x} }~=~ \dfrac{1}{N} \sum\limits_{i=1}^n x_i\)

\(~~~~~~~~~~\sigma_\overline{x}^2 ~=~\small{ \dfrac{\sigma }{n} }\)

\(~~~~~~~~~~\sigma_\overline{x} ~=~\small{ \dfrac{\sigma }{\sqrt{n} } }\)

The above expression was derived using the method of maximum likelihood estimation . The derivation was skipped here.

In the above result it was assumed that the all the data points come from same distribution and hence have same standard deviation \(\small{\sigma}\). However, there are occasions when we have to find the mean of data points (\small{x_i }\) each with a different uncertainity \(\small{\sigma_i}\).

Again, we can derive the combined mean and uncertainity of data points with different standard deviations using the method of maximum likelihood. Each data point is weighted with the inverse of its own standard deviation so that a point with smaller uncertainity are given more weight than a point with larger uncertainity. .

Skipping the derivation, we present the expressions for the weighted mean and its uncertainity in terms of individual values and their uncertainities as,

\(~~~~~~~~~~\small{ \overline{x} ~=~ \dfrac{\sum\limits_{i=1}^n x_i/\sigma_i^2}{\sum\limits_{i=1}^n 1/\sigma_i^2} }\)

\(~~~~~~~~~~\sigma_\overline{x}^2 ~=~\small{ \dfrac{1}{\sum\limits_{i=1}^n 1/\sigma_i^2 } }\)

\(~~~~~~~~~~\sigma_\overline{x} ~=~\small{\sqrt{ \dfrac{1}{\sum\limits_{i=1}^n 1/\sigma_i^2 }} }\)

Example-1 : The concentration of a chemical in human blood under certain disease conditions was measured in four different experiments as, \(~~\small{11.2 \pm 1.9~\mu g/L,~~10.1 \pm 2.4~\mu g/L,~~11.9 \pm 2.1~\mu g/L,~~12.3 \pm 2.5~\mu g/L }\).
Compute the weighted average of these 4 measurements and the uncertainity on the weighted average.

We assign,
\(\small{x_1 = 11.2,~~x_2=10.1,~~x_3=11.9,~~x_4=12.3 }\)
\(\small{\sigma_1 = 1.9,~~\sigma_2~=~2.4,~~\sigma_3 = 2.1,~~\sigma_4 = 2.5 }\)

Using the above formula for the weighted mean and uncertainity on it, we get,

\(~~~~~~~~~~\small{\overline{x} ~=~ \dfrac{\sum\limits_{i=1}^n x_i/\sigma_i^2}{\sum\limits_{i=1}^n 1/\sigma_i^2} ~~=~~\dfrac{ \left(\dfrac{11.2}{1.9^2} + \dfrac{10.1}{2.4^2} + \dfrac{11.9}{2.1^2} + \dfrac{12.3}{2.5^2} + \right)}{\left(\dfrac{1}{1.9^2} + \dfrac{1}{2.4^2} + \dfrac{1}{2.1^2} + \dfrac{1}{2.5^2} \right) } ~~=~~ 11.4 }\)

\(~~~~~~~~~~\small{\sigma_\overline{x}^2 ~=~\dfrac{1}{\sum\limits_{i=1}^n 1/\sigma_i^2 }~~=~~ \dfrac{1}{\left(\dfrac{1}{1.9^2} + \dfrac{1}{2.4^2} + \dfrac{1}{2.1^2} + \dfrac{1}{2.5^2} \right)} ~~=~~ 1.2 }\)

The weighted mean of the 4 results is \(\small{11.4 \pm 1.2}\)