Biostatistics with R

Propagation of uncertainities

Suppose we want to estimate a quantity Q which is a function of variables x,y,z,...
We write,

\(~~~~~~~~~~~~~~~~~~~~~\small{Q = f(x,y,z,...)}\)

In the previous section, we derived an expression for the uncertainity df of a function \(\small{f(x,y,z,...)}\) in terms of errors (dx, dy, dz,...) in the variables as,

\(\small{ dQ~=~ \dfrac{\partial f}{\partial x} dx~+~\dfrac{\partial f}{\partial y} dy~+~\dfrac{\partial f}{\partial z} dz ~+~....}\)

where \(\small{dx, dy, dz, ... }\) are the actual erros on the quantities.


In general, we may not get the values of actual errors on measured quantities, since their correct values are not known. For each variable, we generally get a mean value and its standard deviation computed from certain number of samples. We can use the standard deviations of dependent variables as a measure of their uncertainity .


The question is, given the standard deviations of the variables x, y, z,..., can we estimate the standard deviation in Q as a measure of the uncertainity on its computed value?. We will derive a methodology that uses the standard deviations \( \sigma_x, \sigma_y, \sigma_z, ....\) of the underlying distributions.



If we want to skip the derivation, we can jump to the summary formula box that follows to see the result and proceed from that point.



Derivation of a generalized error propagation formula:

We start with the function,
\(\small{Q = f(x,y,x,...)}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(1) \)

We already know that
\(\small{ dQ~=~ \dfrac{\partial f}{\partial x} dx~+~\dfrac{\partial f}{\partial y} dy~+~\dfrac{\partial f}{\partial z} dz ~+~....}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(2) \)

Let \(\small{x_i, y_i, z_i, ... }\) denote the individual samples from their respective population

Let \(\small{\overline{x}, \overline{y}, \overline{z},... }\) denote the mean values over the entire populations. That is, the sample size N is of entire populations or \(~~\small{N -> \infty ~~}\) when sampled from a distribution. Throuout this derivation, this is true of the sample size N.

We have, for a single set of data points \(\small{x_i, y_i, z_i, ... }\),
\(\small{Q_i = f(x_i, y_i, z_i) }\)

We also assume that the best estimate of Q is when the variables have their average values (this may not be true always). We can write,

\(\small{\overline{Q} = f(\overline{x}, \overline{y}, \overline{z}) }\)

We can write,
\(\small{dQ = Q_i - \overline{Q},~~~~dx = x_i -\overline{x},~~~~dy=y_i-\overline{y},~~~~dz = z_i-\overline{z} }\)

Substituting these expressions in equation (2) we get,
\(\small{dQ~=~Q_i-\overline{Q}~=~(x-x_i)\dfrac{\partial f}{\partial x}~+~(y-y_i)\dfrac{\partial f}{\partial y}~+~(z-z_i)\dfrac{\partial f}{\partial z}~+~....}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(3) \)

Using these, we will write down the expression for the variance \(\small{\sigma^2_Q }\) in Q:

\(\sigma_Q^2~=~\small{\dfrac{1}{N}\sum_\limits {i=1}^N (Q_i - \overline{Q})^2 }\)

\(~~~~~~=~\small{\dfrac{1}{N} \sum_\limits{i=1}^N \left( (x_i - \overline{x})\dfrac{\partial f }{\partial x } + (y_i - \overline{y})\dfrac{\partial f }{\partial y } + (z_i - \overline{z})\dfrac{\partial f }{\partial z } + \right )^2 }\)

\(~~~~~~~ \begin{split} ~=~~\small{ \dfrac{1}{N} \sum_\limits{i=1}^N \left( (x_i - \overline{x})^2\left(\dfrac{\partial f }{\partial x }\right)^2 + (y_i - \overline{y})^2\left(\dfrac{\partial f }{\partial y }\right)^2 + (z_i - \overline{z})^2\left(\dfrac{\partial f }{\partial z }\right)^2 \\ \quad + 2(x_i - \overline{x})(y_i - \overline{y})\left(\dfrac{\partial f}{\partial x}\right) \left(\dfrac{\partial f}{\partial y}\right) + 2(y_i - \overline{y})(z_i - \overline{z})\left(\dfrac{\partial f}{\partial y}\right) \left(\dfrac{\partial f}{\partial z}\right) \\ \quad + 2(z_i - \overline{z})(x_i - \overline{z})\left(\dfrac{\partial f}{\partial z}\right) \left(\dfrac{\partial f}{\partial x}\right)~+~....... \right )~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(4) \\ } \end{split} \)


We know that,

\( \sigma_x^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^{N}(x_i - \overline{x})^2 ~~~~~~~~ }\) Variance in x

\( \sigma_y^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^{N}(y_i - \overline{y})^2~~~~~~~~ }\) Variance in y

\( \sigma_z^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^{N}(z_i - \overline{z})^2 ~~~~~~~~ }\) Variance in z

\( \sigma_{xy}^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^N (x_i - \overline{x})(y_i - \overline{y})~~~~ }\) Covariance between x and y.

\( \sigma_{yz}^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^N (y_i - \overline{y})(z_i - \overline{z})~~~~~~ }\) Covariance between y and z

\( \sigma_{zx}^2 = \small{ \dfrac{1}{N} \sum\limits_{i=1}^N (z_i - \overline{z})(x_i - \overline{x})~~~~~~ }\) Covariance between z and x

................... similarly for other variables ............................

Substituting the above expressions into equation(4), we get,

\(\small{ \sigma_Q^2~=~\sigma_x^2 \left( \dfrac{\partial f}{\partial x} \right)^2 + \sigma_y^2 \left( \dfrac{\partial f}{\partial y} \right)^2 + \sigma_z^2 \left( \dfrac{\partial f}{\partial z} \right)^2 + \sigma_{xy}^2 \left(\dfrac{\partial f}{\partial x} \right) \left(\dfrac{\partial f}{\partial y} \right) + \sigma_{yz}^2 \left(\dfrac{\partial f}{\partial y} \right) \left(\dfrac{\partial f}{\partial z} \right) + \sigma_{zx}^2 \left(\dfrac{\partial f}{\partial z} \right) \left(\dfrac{\partial f}{\partial x} \right) +... } \)
\(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-------- (5) \)

If the variables x,y and z etc. are independent of each other, their pairwise covariance terms \(\small{\sigma_{xy}^2, \sigma_{yz}^2, \sigma_{zx}^2 }\) etc., will approach zero, since the equally probabable positive and negative differences will add to zero.

In this case, the above expression for the variance in Q reduces to,

\(\small{ \sigma_Q^2~\approx~\sigma_x^2 \left( \dfrac{\partial f}{\partial x} \right)^2 + \sigma_y^2 \left( \dfrac{\partial f}{\partial y} \right)^2 + \sigma_z^2 \left( \dfrac{\partial f}{\partial z} \right)^2 + .... }~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(6) \)

Thus, in a formula \(\small{Q = f(x,y,z,...) }\), the uncertainities in terms of variances of x, y, z, ... add quardratically to give the variance in the computed quantity Q.




We summarize the important reuslt from the above derivation here:


Propagation formula for uncorrelated (independent) variables:

For a function \(\small{Q = f(x,y,z,...)}\) of uncorrelated(independent) variables x,y,z,..., we can write the combined uncertainity \(\small{\sigma_Q }\) in Q in terms of the uncertainities \(\small{\sigma_x, \sigma_y, \sigma_z,... }\)in the variables as,

\(\small{ \sigma_Q^2~\approx~\sigma_x^2 \left( \dfrac{\partial f}{\partial x} \right)^2 + \sigma_y^2 \left( \dfrac{\partial f}{\partial y} \right)^2 + \sigma_z^2 \left( \dfrac{\partial f}{\partial z} \right)^2 + .... } \)

\(\small{\sigma_Q \approx \sqrt{ \sigma_x^2 \left( \dfrac{\partial f}{\partial x} \right)^2 + \sigma_y^2 \left( \dfrac{\partial f}{\partial y} \right)^2 + \sigma_z^2 \left( \dfrac{\partial f}{\partial z} \right)^2 + .... } }\)

Note that the variances of the independent varibales add quardratically to give the combined variance.



Uncertainity propagation in specific formulas


Let us now derive expressions for the uncertainity in a quantity Q which is a function of two independent (uncorrelated) variables x and y. These formulas are handy in computing the uncertainity in many simple estimations. In every case, we will use the propagation formula derive before to arrive at a handy expression.

Uncertainity in addition and subtraction

Let \(~~\small{Q~=~ax \pm by }~~\), where a and b are constants

Then, \(~~~\small{\left(\dfrac{\partial Q}{\partial x}\right)~=~a,~~~~~~ \left(\dfrac{\partial Q}{\partial y}\right)~=~\pm b }\)

The propagation formula \(~~\small{\sigma_Q^2 = \sigma_x^2 \left(\dfrac{\partial Q}{\partial x}\right)^2 + \sigma_y^2 \left(\dfrac{\partial Q}{\partial y}\right)^2 }~~\) becomes,

\(\small{\sigma_Q^2~=~\sigma_x^2 a^2 + \sigma_y^2 b^2 }\)



Note that whether two quantities are added or subtracted, their uncertainities always add quardratically.


Example-1 : The circumference of a rectangular ground was eastimated by measuring its length and width to be \(~\small{L = 52 \pm 0.2~m~~ }\) and \(~\small{W = 32 \pm 0.15~m~~ }\). Compute the uncertainity in the circumference.

The circumference of the rectangular field is given by\(~~~\small{S = 2(L+W) = 2L+2W}\)

Given that \(~~\small{\sigma_L = 0.2~m,~~~a = 2,~~~\sigma_W = 0.15~m,~~~b = 2 }\),

\(\small{\sigma_S^2 = \sigma_L^2 a^2 + \sigma_W^2 b^2~~=~~0.2^2\times 2^2 + 0.15^2 \times2^2~=~4.182 }\)

Therefore, \(~~\small{\sigma_S~=~\sqrt{4.182} = 2.045~m~~~ }\) is the uncertainity in the circumference.














Uncertainity in multiplication and division

(i) \(~~\) Let \(\small{Q~=~a x y,~~ }\) where a is a constant

We write,\(~~~\small{\left(\dfrac{\partial Q}{\partial x}\right)=ay, }~~~\)\(~~~\small{\left(\dfrac{\partial Q}{\partial y}\right)=ax }\)

The propagation formula \(~~\small{\sigma_Q^2 = \sigma_x^2 \left(\dfrac{\partial Q}{\partial x}\right)^2 + \sigma_y^2 \left(\dfrac{\partial Q}{\partial y}\right)^2 }~~\) becomes,

\(\small{ \sigma_Q^2~=~\sigma_x^2 a^2 y^2 + \sigma_y^2 a^2 x^2 }\)

Dividing throught by Q, we get


\(\small{\dfrac{\sigma_Q^2}{Q^2} = \dfrac{\sigma_x^2}{x^2} + \dfrac{\sigma_y^2}{y^2} }\)

\(\small{\sigma_Q~=~ Q \sqrt{ \dfrac{\sigma_x^2}{x^2} + \dfrac{\sigma_y^2}{y^2} } }\)










(ii) \(~~\) Let \(~~\small{Q = \dfrac{ax}{y}}\)

with this,\(~~~\small{\left(\dfrac{\partial Q}{\partial x}\right)=\dfrac{a}{y}, }~~~\)\(~~~\small{\left(\dfrac{\partial Q}{\partial y}\right)=-\dfrac{ax}{y^2} }\)

Substituting in the propagation formula we get,

\(\small{\sigma_Q^2~=~\sigma_x^2 \dfrac{a^2}{y^2}~+~\sigma_y^2 \dfrac{a^2x^2}{y^4} }\)

Dividing throught by Q, we get

\(\small{\dfrac{\sigma_Q^2}{Q^2} = \dfrac{\sigma_x^2}{x^2} + \dfrac{\sigma_y^2}{y^2} }\)

\(\small{\sigma_Q~=~ Q \sqrt{ \dfrac{\sigma_x^2}{x^2} + \dfrac{\sigma_y^2}{y^2} } }\)










We notice that during both multiplication and division, the uncertainities add in the same way.

Example : The area of a triangle in terms of the length of its base b and height h is given by \(\small{A = \dfrac{1}{2}bh }\). A set of measurements yielded the mean values of \(\small{b = 14~cm }\), \(\small{h = 23.5~cm }\) with uncertainities \(\small{\sigma_b = 0.7~cm }\) and \(\small{\sigma_h = 0.5cm }\). Compute the uncertainity in the area.

We have, area of triangle \(~~\small{A = \dfrac{1}{2}bh~=~\dfrac{1}{2} \times 14 \times 23.5~=~ 164.5~cm^2 }\)

\(\small{\sigma_A~=~A \sqrt{\dfrac{\sigma_b^2}{b^2} + \dfrac{\sigma_h^2}{h^2}}~~=~~164.5 \times \sqrt{ \dfrac{0.7^2}{14^2} + \dfrac{0.5^2}{23.5^2} }~~=~~8.94~cm^2 }~~~~\) is the uncertainity in the area.

















Uncertainity in powers

Let \(~~\small{Q = a x^{\pm b} },~~~\) where a and b are constants.

We get\(~~\small{\dfrac{\partial Q}{\partial x} = \pm ab x^{\pm b-1} = \dfrac{bQ}{x} }\)

\( \small{\sigma_Q^2 = \sigma_x^2 \left(\dfrac{\partial Q}{\partial x}\right)^2~=~\sigma_x^2 \dfrac{Q^2 b^2}{x^2} }\)

\(\small{ \dfrac{\sigma_Q^2}{Q^2}~=~b^2 \dfrac{\sigma_x^2}{x^2} }\)

\(\small{\sigma_Q~=~Q b \dfrac{\sigma_x}{x} }\)

Example : If the radius of a metalli sphere is measured to be \(\small{r = 7.61 \pm 0.2~cm },~\) compute the uncertainity in its volume.

Volume of the sphere = \(\small{ V = \dfrac{4}{3} \pi r^3 = \dfrac{4}{3} \times 3.14 \times 7.61^3 = 1846.0~cm^3 }\)

The uncertainity in volume = \(\small{\sigma_V = Vb \dfrac{\sigma_r}{r} = 1846.0 \times 3 \times \dfrac{0.2}{7.61} = 145.5~cm^3 }\)


















Uncertainity in exponential function

Consider the exponential relation \(~~\small{Q = ae^{\pm bx}} \)

\(\small{\dfrac{\partial Q}{\partial x} = \pm ab e^{\pm bQ} = \pm bQ }\)

\(\small{\sigma_Q^2 = \sigma_x^2 \left(\dfrac{\partial Q}{\partial x}\right)^2 = \sigma_x^2 (\pm bQ)^2 = \sigma_x^2 b^2Q^2 }\)

\(\small{\dfrac{\sigma_Q^2}{Q^2} = b \sigma_x^2 }\)

\(\small{\sigma_Q = Qb\sigma_x }\)


Example :















Uncertainity in taking the power of a constant

In the case when the constant raised to the poer is not exponential e, we can estimate error propagation by the following manipulation:

Let \(~~\small{Q~=~a^{\pm bx} }\)

Writing \(\small{a}\) as \(\small{e^{log(a)} }\), we get

\(~~~\small{Q~=~(e^{log(a)})^{\pm bx}~=~e^{\pm(b~ log(a)) x} }\)

This is an exponential form. Using the previously derived formmula for propagation of errors in exponential function, we can write,

\(\small{\sigma_Q~=~Q (b~log(a)) \sigma_x }\)


Example :











Uncertainity in logarithm

Let \(~~\small{Q~=~a~log(x)}\)

We get \(~~\small{ \dfrac{\partial Q }{\partial x} = \dfrac{a}{x} }\)

With this,\(~~\small{\dfrac{\sigma_Q^2}{Q^2} = \sigma_x^2 \left (\dfrac{\partial Q }{\partial x}\right )^2~=~ a^2 \dfrac{\sigma_x^2}{x^2} }\)

\(\small{\sigma_Q~=~aQ{\dfrac{\sigma_x}{x}} }\)





Example :