Biostatistics with R

R scripts

So far we have been typing the R-commands in the R prompt ">", which is a convenient method for executing few lines of commands. But this cannot be continued in real life applications where codes spanning many tens of lines are required to be written and have to be stored for later useage. For this purpose, R allows us to write a script , which is a collection of many R statements written in a file. The statements are written one below the other separated by line break, without the R prompt "$>$" at the beginning of each statement . This script file can be executed inside R prompt with a single line of command, which in turn executes the statements in the script sequentially.

The R script is very useful for creating very long code pipelines consisting of cascading steps of some complex analysis. For example, a pipeline for some clinical trial data analysis may start with reading the data from an excel sheet and perform data selection, filtering, plots, statistical tests and conclusion tables. Like any other language program, this pipeline can be reused for similar data sets at the input.

To create an R script, type the R commands into a text file and save the file with an extension ".R" or ".r". The R script is recognised by these extensions. For example, if we create a script called "compute", save it in a text file called "compute.R" or "compute.r"

To run an R script , call the source() function with file name as a string input. Thus, in order to execute the script "compute.R" in R prompt, type

> source("compute.R")

The code statements will be executed in the order in which they appear in the script, and results will be printed on screen or written into devices as per the script.

In the above example, it is assumed that the script "compute.R" is in the current directory. If the script is not in the current directory, then the full file path should be given inside the double quotes.

The R script will not be compiled prior to the execution. The errors, if any, will be caught only during execution. This has the following consequence: Suppose we have 12 statements in a script, and the 8th statement has an error. When the script is executed, first 7 correct statements will be executed, and the script will report the error and break at the 8th statement.

A simple R script

We will now create a simple R script which declares two numbers, multiplies them and prints the result.

Save the following script lines in a text file called test.R

# This is a simple R script a = 5 b = 6 c = a*b print(c)

To run this script in R prompt and get the output, type

> source("test.r")
[1] 30

We notice that the first line of the script starting with '#' was ignored by R. The hash in the beginning of a line is used for flagging the line as a comment . The commented lines are ignored by R while executing the script.

After the execution of a script from the R prompt, the global variables created by the script will still stay in the R environment. This may conflict with other variable of same name, if present. We have to be careful about this fact.

The internal library functions and external functions can be included in a R script. We will learn this in the later chapters.