Mathematical tools for natural sciences

From the point of view of statistics, the experiments we do, or the systems we observe
come under two categories: * deterministic * and * random * experiments.

The deterministic system results in * predictable outcomes*. For example, in the
absence of appreciable air resistance, we can almost predict the trajectory of a projectile thrown at an angle
with a known velocity using the equations of Newtonian Mechanics. Here the system obeys the known laws.

There are experiments which have two or more possible outcomes, and the experiment may result
in * any one of those possible outcomes in a random manner *. They are called * stochastic * systems.
There are no deterministic equations or laws governing these systems, and * they can be studied only with the help of statistics *. For example, we throw a dice and look for the number that shows up. Any one of the six possibilities
can occur, and we cannot predict it. * We can only say that if the dice is perfect, all the six numbers have equal chance for showing up! *. This is what make the dice throwing a * random experiment * as against the projectile motion, which is deterministic.

** A random experiment will result in one or more possible outcomes**. A set of all possible outcomes of a random experiment is called its

For example, an experiment involving a single coin toss has two possibe outcomes namely Head(H) and Tail(T). We represent the outcome space as a * set * in set theory notation:
\[\small{S = \{H, T\} }\]

Similarly, in a random experiment that involves the throw of a single dice, the outcome space has six possible results of a dice throw: \[\small{ S = \{1,2,3,4,5,6\} }\]

** A subset of possible events in an outcome space is called an event.** Thus, the set
\[ \small{A = \{2,4,6\} }\] is an event(subset) of outcome space of dice throw and contains possible events that are even in value. The set \[ \small{B = \{1,2,3\} }\] is another event(subset) of dice throw that has outcomes whose values are less than four.

When a random experiment results in an outcome which is a member of set 'A', we say that

The * probability that an event occurs denotes the chance or likelihood of the event occuring*, and is denoted by a symbol p(A). The value of p(A) is in the range 0 to 1. A value of p(A)=1 denotes that the event A will occur certainly, and p(A)=0 denotes that the event A will definitely not occur. A value of p(A)=0.5 tells us that the event A has equal chance of occuring or not occuring.

There are two ways in which we can approach the probability.

** The first way ** involves looking at the outcome space and count the number of possible ways in which an event A can occur. This is same as the number elements in the set A. We define the probability of occurance of event A as,
\[ \small{p(A) = \frac{number~of~outcomes~favourable~to~event~A}{total~number~of~possible~outcomes} }\]

Suppose we want to compute the probability of * getting a value more than 4 in a dice throw*.
Out of six possible outcomes, two outcomes, namely 5 and 6 satisfy this event. Therefore, we say
\[\small{ p(>4) = \frac{number~of~outcomes~that~result~in~>4}{total~number~of~possible~outcomes} = \frac{2}{6} }\]

As a second example, we define an experiment in which we throw a pair of coins and note down the result in both of them. If first coin shows Head, and second shows Tail, we call this out come "HT". This way, there are four possible outcomes in thsi random experiment: \[ \small{S = \{HH, HT, TH, TT\} } \]

Now we ask the question : * If we toss a pair of coins, what is the probability of getting at least one head in the pair?*. When we look at the above outcome space, we realize that out of the 4 elements in the space, three of them namely \[ \small{ \{HH, HT, TH \} }\] result in * at least one head *. Therefore, we write, for a two coin toss,
\[ \small{ p(at~least~one~head ) = \frac{number~of~outcomes~that~result~in~at~least~on~head}{total~number~of~possible~outcomes} = \frac{3}{4} }\]

** We can also approach the probability from the experimental side**. Suppose an event A has a probability of occurance p(A). We perform the experiment once and note down *whether it resulted in event A or not*. We then repeat this experiment N times ( N trials), and count the number of trials n(A) that resulted in the event A. For example, we throw a coing n times and note down the number of times it resulted in head(H).

The ratio \(\small{ \displaystyle\frac{n(A)}{N} }\) is the *relative frequency * of occurance of event A. This ratio is unstable for small number of trials N, but stabilizes towards a number for very large N.

This number, \( \small{p(A) = \displaystyle\frac{n(A)}{N} } \) for very large N is called ** the probability of occurance of event A**.

How many trials we need before the experimentally observed ratio is equal to the probability obtained before in terms of favourable events?. ie., how many coin tosses are needed to get a p(H) exactly equal to 0.5?. It turns out that we need infinite number of tosses!!

In the same manner, if we throw a dice very very large number of times, the probability of getting each one of the 6 outcomes will approach the number \( \small{\frac{1}{6}} \) as the number of throws approaches infinity!

We can realistically simulate the coin toss experiment using the

We first generate N deviates. We separate those numbers with value below 0.5 and count them. We call this "Head". Similarly, we separate those numbers out of N that are above 0.5 and coun them. We call them "Tail". The ration of number of heads to total number N gives the p(H), the probability of getting head. We repeat this for increasing N like N=10,100,10000, 100000,etc. As N increases, p(H) should approach the expected probability (called "apriory probability") of 0.5.

##### script "coin_toss.r" ################################# ## Simulating a coins toss using unform deviates from runif() ## number of trials in 6 experiments. ntrials = c(10,100,1000, 10000, 100000, 1000000) ## print the column nmaes print(paste("trials ", "heads ", "tails ", "P(head)")) ## find p(H) for each experment with N trials (ie., N coin tosses) for(N in ntrials) { ## generate N uniform deviates between 0 and 1 and returns them as a vector nx = runif(N) ## If the number is less than or equals 0.5, call it Head(H) nheads = length(subset(nx, nx <= 0.5)) ## If the number if greater than 0.5, call it Tail(H) ntails = length(subset(nx, nx > 0.5)) ## compute the probblity of head prob_head = nheads/N print(paste(nheads+ntails," ", nheads, " ", ntails," ", prob_head)) }

Run the above code file in R with "source(cointoss.r)" in command line. Results similar to the following results are printed: (It won't be same numbers everytime due to randomness).

[1] "trials heads tails P(head)" [1] "10 4 6 0.4" [1] "100 49 51 0.49" [1] "1000 512 488 0.512" [1] "10000 4949 5051 0.4949" [1] "100000 50191 49809 0.50191" [1] "1000000 500186 499814 0.500186"

From the printout above, we see that as N increases, p(H) stabilizes arounf 0.5, as expected. The p(head) will tend to 0.5 as N tends to infinitly large number of events.