Biostatistics with R

Moments of a distribution

The mean value of a distribution is computed by summing all the data points and dividing by their total number.

If a value \(x\) occurs \(m\) times in the population, then \(m\) is called the frequency of occurance of x in the population. Then, summing the number \(m\) times during the computation of mean is same as the multiplication \(mx\). We generalize this concept. Let us assume that the population has \(n\) data points such that the value \(x_1\) occurs \(n_1\) times, number \(x_2\) occurs \(n_2\) times,..... upto number \(x_m\) occurs \(n_m\) times. The expression for the population mean can then be written as,

\( \small{\mu = \dfrac{(sum~of~all~data~points)}{n} = \dfrac{n_1x_1 + n_2x_2 + n_3x_3+....+n_mx_m}{n} = \dfrac{\sum\limits_{i=1}^m n_ix_i}{n} }\)

If a number \(x_i\) occurs \(n_i\) times, then \(p(x_i) = \dfrac{n_i}{n} \) is the probability of occurance of \(x_i\) in the population. Therefore,

\( \small{\mu = \dfrac{\sum\limits_{i=1}^m n_ix_i}{n} = \sum\limits_{i=1}^m \dfrac{n_i}{n}x_i = \sum\limits_{i=1}^{m}x_ip(x_i) } \)

By the same logic, the variance of the population is computed as,

\( \small{ \sigma^2 = \dfrac{1}{n}\sum\limits_{i=1}^m n_i (x_i - \mu)^2 = \sum\limits_{i=1}^m \dfrac{n_i}{n} (x_i - \mu)^2 = \sum\limits_{i=1}^m p(x_i)(x_i-\mu)^2 }\)

The expression \( \small{\mu = \sum\limits_{i=1}^m x_ip(x_i)} \) is called the first moment of the population distribution around zero mean

The expression \( \small{\sigma^2 = \sum\limits_{i=1}^m (x_i - \mu)^2 p(x_i) } \) is called the second moment of the population distribution around the mean \(\mu\).

Similarly, the third moment \( \small{\sum\limits_{i=1}^m (x_i - \mu)^3 p(x_i) ~~} \)and the fourth moment \(~~ \small{\sum\limits_{i=1}^m (x_i - \mu)^4 p(x_i) } \) etc can be defined.

    The word moment is used in analogy with the concept of moments of a particle distribution around their center of gravity.

    If the variable \(x\) takes continuous values, then the probability distribution \(p(x)\) is continuous. In this case, the summation terms in the moments of the distribution are replaced by integrations over corresponding variables.

    Thus the monents of the continuous probability distribution function f(x) are defined as follows:

    First moment : \( \small{ ~~\mu = \int x f(x) dx }\)

    Second moment : \( \small{ ~~\sigma^2 = \int (x - \mu) f(x) dx }\)

    \(n^{th}\) moment : \( \small{M(n) = \int (x - \mu)^n f(x) dx }\)

    Given the expression f(x) of a discrete or continuous probability density distribution, the mean \(\small{\mu}\) and variance \(\small{\sigma^2}\) can be computed using the approriate definitions of moments.