Biostatistics with R

R scripts

So far we have been typing the R-commands in the R prompt ">", which is a convenient method for executing few lines of commands. But this cannot be continued in real life applications where codes spanning many tens of lines are required to be written and have to be stored for later usage. For this purpose, R allows us to write a script , which is a collection of many R statements written in a file. The statements are written one below the other separated by line break, without the R prompt "$>$" at the beginning of each statement. This script file can be executed inside R prompt with a single line of command. With this command, the statements in the script are sequentially executed.


There are some fundamental differences between R script and the codes written in higher languages like C, C++ and Java.

  • In the higher languages, a code is compiled by the language compiler to check its correctness in the context of language constructs and an executable file is created. (For example, in C, it is named "a.out" by default). This executable is then run in the operating system environment.
    The R language does not create an executable of the script. The commands in the script are executed in R prompt sequentially one after another, from the beginning to the end of the script. Any errors in a statement are flagged by the R environment only prior to its execution, and the execution is stopped at this wrong statement, and control is returned to the R prompt. We can correct the error and proceed from this statement again. As a consequence, if there are 50 statements in the script with a syntax or other error on 44th statement, the 43 correct statements prior to this will be executed and the compiler will flag the 44th wrong statement and the execution is halted at this point. Error message will be printed on the R prompt.


  • In higher languages, the executable is run from operating system environment. All the variables and object created in memory during the program execution are deleted from memory once the execution is completed, and the control comes back to operating system prompt. On the other hand, the variables and objects created by the R script will stay in R memory even after the execution of script is completed. They will be erased only when we quit R or delete the variables by their name in R.

    Thus, if we create 2 global variables through the statements "a=5" and "b=10" inside an R script, they will be present in R memory even after the scrtipt is executed. They can be used even after the execution of script. Any subsequent mention of the varibale "a" in R will assign this value 5 to it. We have to be careful about this.


  • As a consequence, practically there is no difference between executing 10 commands one by one in R prompt and writing them in a script and executing. We can in fact copy the commands from R script one at a time and run it in R environment! In the higher languages, the executable will run all the commands in a sequence at once and exits.

The R script is very useful for creating very long code pipelines consisting of cascading steps of some complex analysis. For example, a pipeline for some clinical trial data analysis may start with reading the data from an excel sheet and perform data selection, filtering, plots, statistical tests and conclusion tables. Like any other language program, this pipeline can be reused for similar data sets at the input.



To create an R script, type the R commands into a text file and save the file with an extension ".R" or ".r". The R script is recognised by these extensions. For example, if we create a script called "compute", save it in a text file called "compute.R" or "compute.r"



To run an R script , call the source() function with file name as a string input. Thus, in order to execute the script "compute.R" in R prompt, type


> source("compute.R")

The code statements will be executed in the order in which they appear in the script, and results will be printed on screen or written into devices as per the script.


In the above example, it is assumed that the script "compute.R" is in the current directory. If the script is not in the current directory, then the full file path should be given inside the double quotes.



Setting the path to the R script file

While executing an R script, the path to the location of the script file should be given. This can be done in two ways:


1. The R script file name with complete path name can be explicitly passed to the source() function. For example, in a linux machine, to source a script called "analysis.r" inside a directory "/home/user/myScripts", type

> source("/home/user/myScripts/analysis.r")


2. Instead of typing the entire path to the file, we can also set the path to the file as current directory using setwd() function. The corresponding function getwd() fetches the current directory name.

> setwd("/home/user/myScripts") > > getwd()
[1] "/home/user/myScripts"

Once the path to the script "analysis.r" is set this way, we can run the script by mentioning the file name without the path inside the source() function:

> source("analysis.r")


In the case of windows operating system , we should use the "/" as a separator in the path rather than the ususal "\" that is used in general:

> setwd("D:/program files/myScripts") >


Alternatively, a double "//" can also be used as a separator for setting the current directory with the R script:

> setwd("D:\\program files\\myScripts")

A simple R script

We will now create a simple R script which declares two numbers, multiplies them and prints the result.



Save the following script lines in a text file called test.R


# This is a simple R script a = 5 b = 6 c = a*b print(c)

To run this script in R prompt and get the output, set the path to the script file and type


> source("test.r")
[1] 30

We notice that the first line of the script starting with '#' was ignored by R. The hash in the beginning of a line is used for flagging the line as a comment . The commented lines are ignored by R while executing the script.


As mentioned earlier, after the execution of a script from the R prompt, the global variables created by the script will still stay in the R environment. This may conflict with other variable of same name, if present. We have to be careful about this fact.


The internal library functions and external functions can be included in a R script. We will learn this in the later chapters.