Biostatistics with R

User defined functions in R

A computer language consists of two important components namely data and function.

A function is a self contained code module that takes one or more data variables as input, implements an algorithm to accomplish a particular task and returns a result. This closely follows the definition of mathematical function which takes one or more variables at the input and returns a single value as result.

The code lines that can accomplish a task are encapsulated in a function inside a program. Whenever that task has to be accomplished with a data, a call to the function is made, and the data is passed to it. The fuction carries out the task and returns a result. Typically the function call is just a line of code. Suppose we want to accomplish same taks 10 times with new sets of data. We juat have to call the function 10 times with new data, and get the result. Without functions, we may have to type the code lines for the task again and again 10 times!.

Functions are written in a generic format without keeping a particular program in mind. This makes a single function useful to many different programs. Each function for completing one task. If your code can generate data in a particular type as demanded by the function, you can call the function from your code.

Many functions can be combined to form a library , which can be attached to a program and used. For example, the functions performing various tasks related to string operations can form a library. Similarly, functions for basic maths operations (like taking square root or exponential of a given number) can be combined into a library.

Writing an R function

Like other languages, R has the ability to support used defined functions. An R function takes objects and data variables as function argument and returns an object.

The function in R has the following structure:

myfunction <- function(argument1, argument2, ...) { statements return(object) }

Here, function is a key word used for defining the function. The entities named argument1, argument2 etc. are function arguments. They can be either simple variable types or objects like arrays, frames and lists. The comma separated list of function arguments is placed inside a pair of brackets next to the function keyword.

Inside the function, the arguments passed in are used for computation. The statements refers to such lines of script. Finally, the function passes the computed object through a return statement.

All the lines of the code inside the function are enclosed in a pair of curly brackets following the keyword function. The name myfunction refers to the name given to the function. When a function call is to be made, it will be called with this name.

Once a function is defined, it can be called with the general syntax,

objectName <- myfunction(arg1, arg2, ...)

where arg1, arg2,... are the variables of typescreated inside the program before the call. The types of arg1, arg2,... should be exactly same as the types of argument1, argument2,... respectively. Otherwise, compiler generates error.

Example script

We will write a simple script with a function. Given a vector of numbers and the vector size as function arguments, the function normalize() performs an arbitrarily defined operation of dividing every element by the number and taking a square root. It then returns the resulting vector:

# defining a function called normalize normalize <- function(avec, anum) { norvec <- (avec/anum)^0.5 return(norvec) } # Defining a vector and a number for data. vec <- c(45.0, 67.0, 81.0, 57.0, 103.0, 122.0, 68.0, 98.0) anumber = 21.5 # function call normalvec <- normalize(vec, anumber) # print the resulting vector returned by the function print(normalvec) }

Executing the above script generates the following output:

[1] 1.446728 1.765299 1.940990 1.628239 2.188766 2.382104 1.778424 2.134980