Basic Statistics with R

Moments of a distribution

The mean value of a distribution is computed by summing all the data points and dividing the sum by the number of data points.


As we have learnt in the previois section, if a value \(x\) occurs \(m\) times in the population, then \(m\) is called the frequency of occurance of x in the population. Then, summing the number \(m\) times during the computation of mean is same as the multiplication \(mx\). We generalize this concept. Let us assume that the population has \(N\) data points such that the value \(x_1\) occurs \(n_1\) times, number \(x_2\) occurs \(n_2\) times,..... upto number \(x_m\) occurs \(n_m\) times, with \(n_1 + n_2 + n_3 + ... +n_m~=~N.~~ \)The expression for the population mean can then be written as,


\( \small{\mu = \dfrac{(sum~of~all~data~points)}{N} = \dfrac{n_1x_1 + n_2x_2 + n_3x_3+....+n_mx_m}{N} = \dfrac{\sum\limits_{i=1}^m n_ix_i}{N} }\)


If a number \(x_i\) occurs \(n_i\) times, then \(p(x_i) = \dfrac{n_i}{N} \) is the empirical probability of occurance of \(x_i\) in the population. Therefore,


\( \small{\mu = \dfrac{\sum\limits_{i=1}^m n_ix_i}{N} = \sum\limits_{i=1}^m \dfrac{n_i}{N}x_i = \sum\limits_{i=1}^{m}x_ip(x_i) } \)






By the same logic, the variance of the population is computed as,


\( \small{ \sigma^2 = \dfrac{1}{N}\sum\limits_{i=1}^m n_i (x_i - \mu)^2 = \sum\limits_{i=1}^m \dfrac{n_i}{N} (x_i - \mu)^2 = \sum\limits_{i=1}^m p(x_i)(x_i-\mu)^2 }\)




The expression \( \small{\mu = \sum\limits_{i=1}^m x_ip(x_i)} \) is called the first moment of the population distribution around zero mean

The expression \( \small{\sigma^2 = \sum\limits_{i=1}^m (x_i - \mu)^2 p(x_i) } \) is called the second moment of the population distribution around the mean \(\mu\).

Similarly, the third moment \( \small{\sum\limits_{i=1}^m (x_i - \mu)^3 p(x_i) ~~} \)and the fourth moment \(~~ \small{\sum\limits_{i=1}^m (x_i - \mu)^4 p(x_i) } \) etc can be defined.


    The word moment is used in analogy with the concept of moments of a particle distribution around their center of gravity.


    If the variable \(x\) takes continuous values, then the probability distribution \(p(x)\) is continuous. In this case, the summation terms in the moments of the distribution are replaced by integrations over corresponding variables.

    Thus the monents of the continuous probability distribution function f(x) are defined as follows:


    First moment : \( \small{ ~~\mu = \int x f(x) dx }\)

    Second moment : \( \small{ ~~\sigma^2 = \int (x - \mu) f(x) dx }\)

    \(n^{th}\) moment : \( \small{M(n) = \int (x - \mu)^n f(x) dx }\)









    Given the mathematical expression f(x) of a discrete or continuous probability density distribution, the mean \(\small{\mu}\) and variance \(\small{\sigma^2}\) can be computed using the approriate definitions of moments. We will encounter this idea in the coming sections.