A Vector in R is an ordered collection of elements. We can have a vector of numbers, strings or characters. The elements of a given vector must be of same data type. In case we create a vector with mixed element types, R treats it as a vector of strings.
Elements of a vector are stored in consecutive memory locations. There are many internal functions provided by R for manipulating the elements of a vector. Using these functions, we can perform complex operations like sorting, slicing, growing, splitting, filtering etc. on a vector. The vector along with its supporting functions is the most powerful data structure in R. Many library functions of R require the input data in the form of vectors.
A vector can be defined by placing the comma separated list of elements inside a pair of brackets next to the letter 'c' and assigning it to a variable name as shown below. We can either = or <- operators for the assignment.
> avec <- c(10.2, 5.5, 6.9, 7.2, 8.1) > > avec
The data type of the vector is decided by the the data types of the elements it contains. Thus, if all elements of the vector are numbers, the vector takes the type numeric. We can do many numerical operations with such a vector.However, if one or more elements of the vector happen to be of type string, the entire vector will be treated to be string vector, and we cannot perform numerical operations with it. For example, the vector 'avec' defined above is a numeric vector, while the follwing two vectors are treated as string vectors:
> avec1 <- c("AEC", "AED", "AAB", "AFC") > avec2 <- c(10.2, 5.5, "6.9", 7.2, 8.1) > > avec2
A vector can be assigned in many ways. We can use 'assign()' function instead of above syntax. Thus, all of the the following assignments define a vector named "vec" with elements (10.2, 5.5, 6.9, 7.2, 8.2)
> vec <- c(10.2, 5.5, 6.9, 7.2, 8.1) > assign("vec", c(10.2, 5.5, 6.9, 7.2, 8.1) ) > vec = c(10.2, 5.5, 6.9, 7.2, 8.1) > c(10.2, 5.5, 6.9, 7.2, 8.1) -> vec
The individual elements of a vector can be accessed by subscripting the element number inside the square bracket. The subscripting starts with 1.
> x <- c(10,20,30,40,50,60,70,80,90,100,110,120,130) > x[3]
> x[6]
> y= x[3] + x[6] > y
In order to access specific elements of a vector, give the element indices as a vector inside square brackets. Thus, if we want to create a sub-vector 'z' with element numbers 1,3 and 6 from vector x defined above,
> z = x[c(1,3,6)] > z
We can also access consequtive elements of a vector by specifying the start and end element indices separated by colon inside square brackets. The following statement creates a subset vector 'z' with elements 4 to 9 of vector x:
> z = x[c(4:9)] > z
Once a vector is defined, basic mathematical operations like addition, subtraction, multiplication and division performed on it is applied to all its elements individually resulting in another vector. See the examples below:
> vstr <- c(1,2,3,4,5,6,7,8,9) > > vstr + 100
> vstr - 100
> vstr*100
> vstr/100
The algebraic operations between one or more vectors are applied to their individual elements, and a resulting vector is created. Thus if we add two vectors of same length (ie., both having same number of elements), their corresponding elements are added to give a new vector. This is illustrated in the following operations between vectors "vec1" and "vec2" below:
> vec1 <- c(1.5,2.5,3.5,4.5,5.5,6.5) > vec2 <- c(10,20,30,40,50,60) > vec1+vec2
> vec1-vec2
> vec1*vec2
> vec1/vec2
> log(vec2)
We can start with an empty vector and add elements to it. The empty vector is created by
> avec = c() > > avec = c(avec,"ATG","TTG") > > avec
> avec = c(avec, "TATATA", "TTTTTAA") > > avec
We can combine two or more vectors to create a new vector as shown here:
> v1 = c(10,20,30) > v2 = c(100,200,300,400) > v3 = c(1000,2000,3000) > > combvec = c(v1,v2,v3) > > combvec
The algebraic equations written with a vector is applied to its individual elements resulting in a new vector. Thus, if 'x' is a vector of numbers, the equation $y = 3x^3 - 4x^2 + 5x + 6$ can be written for each element of 'x' resulting in a corresponding 'y' vector. See below:
> x = c(1, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0) > > y = (3*x^3) - (4*x^2) + (5*x) - 6 > > y
It is very easy to generate a sequence of numbers in R. Use
> sq <- seq(1,50) > > sq
A sequence can also be generated in steps higher than 1 . For example, use the following syntax to generate a sequence from 1 to 50 in steps of 5:
> sq <- seq(1,50,5) > sq
The sequence can be generated in reverse by flipping the sign of the step:
> sq <- seq(50,1,-5) > sq
When data is collected, there is a possibility that one are more entries are 'missing'. The information may not be available on them. As an example, in a data set consisting on the age of 10 students, age for two of them may be missing, while for all others it is vailable. See the data set below:
Student # | Age(years) |
---|---|
1 | 17 |
2 | 18 |
3 | 17 |
4 | 19 |
5 | missing data |
6 | 17 |
7 | 20 |
8 | missing data |
9 | 16 |
10 | 22 |
We can have our own stratagy to deal with the missing data in the downstream analysis. For example, we may replace the missing numbers by the average value of the data computed without them. We may fill the missiing numbers with zero. But how do we indicate the missing data in a vector? The vectors in R can handle the missing values. The missing value is recognized by the symbol NA . For example, the above mentioned data points can be represented as a vector of integer elements with two NA (missing) values:
> x = c(17, 18, 17, 19, NA, 17, 20, NA, 16, 22) > > x[1] 17 18 17 19 NA 17 20 NA 16 22Note that NA is used as a symbol, and not as a string representation. Different vector operations and functions handle missing values. For example, if we multiply the vector x created above by a constant, only its genuine numbers are multiplied, and missing values are kept as it is:
> x*10[1] 170 180 170 190 NA 170 200 NA 160 220We can identify the missing values in a vector and replace them with another value. A function calledis.na() takes a vector as input, and returns a corresponding vector with TRUE or FALSE values. The elemental locations of the input vector with missing values will have TRUE in the new vector and have the falue FALSE othewise. The logical NOT operation!is.na() returns the complementary values ofis.na() . See here:> x = c(17, 18, 17, 19, NA, 17, 20, NA, 16, 22) > > is.na(x)[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE> !is.na(x)[1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUETo get all the NA elements of a vector x , we use the vector of TRUE and FALSE values returned by is.na(x) as indices of x:
> x[is.na(x)][1] NA NATo get all the non-NA elements of x,we use the vector of TRUE and FALSE values returned by !is.na(x) as indices of x:
> x[!is.na(x)][1] 17 18 17 19 17 20 16 22The missing values in a vector can be replaced by zeroes (or any other value) as shown below:
> x = c(1.5, 2.6, 4.3, NA, 2.2, 5.9, 6.0, NA, 1.2) > > x[is.na(x)] <- 0 > > x[1] 1.5 2.6 4.3 0.0 2.2 5.9 6.0 0.0 1.2
To get the number of elements in a vector The number of elements in a vector (called vector length ) is returned by the function
length() .> xx = c('a','e','r','s','k','g','f') > > L = length(x) > > L[1] 9
Sorting a vector A vector can be sorted in ascending or descending order by the
sort() function, which returns the sorted vector. By default, the vector is sorted in ascending order :> x = c(12,2,34,67,22,55,123) > > sor = sort(x) > > sor[1] 2 12 22 34 55 67 123A vector can be sorted in descending order by setting the boolean parameter decreasing to the value TRUE:
> x = c(12,2,34,67,22,55,123) > > ys = sort(x, decreasing=TRUE) > > ys[1] 123 67 55 34 22 12 2The default call to the
sort() function ignores the NA values present in the vector:> y = c(12,2,34,NA,67,29,NA,NA,45,99) > > sr = sort(y) > > sr[1] 2 12 29 34 45 67 99In case we want to include the NA values while sorting a vector, we have two choices: the NA values can be placed either in the beginning or in the end of the sorted vector. This is achived by a boolean parametwe called na.last . If this takes a value TRUE, the NA values are placed at the end of the sorted vector. If the value is FALSE, the NA values are placed in the beginning of the sorted vector. If this parameter is not used, NA values are dropped from the sorted array. See the code below:
> y = c(12,2,34,NA,67,29,NA,NA,45,99) > > sort(y)[1] 2 12 29 34 45 67 99>> sort(y, na.last=TRUE)[1] 2 12 29 34 45 67 99 NA NA NA>> sort(y, na.last=FALSE)[1] NA NA NA 2 12 29 34 45 67 99
Get the maximum and minimum values of a numeric vector We can get the maximum and minimum values among the elements of a vector by calling
min() andmax() functions:> vec <- c(8.9, 1.5, 3.4, 6.7, 12.8, 7.4) > > max(vec)[1] 12.8> min(vec)[1] 1.5If one or more elements of a vector are NA, then the
max() andmin() functions will return NA as the maximum and minimum values respectively. We can tell the max() and min() to drop the NA's before finding max or min. This is achieved by setting the parameter called "na.rm" to "TRUE" in the max() and min() functions. Then, only thvalid numbers are sorted, dropping NA values. See here:> y = c(12,2,34,NA,67,29,NA,NA,45,99) > max(y)[1] NA> > min(y)[1] NA> > max(y, na.rm=TRUE)[1] 99> > min(y, na.rm=TRUE)[1] 2
Create a character vector with indexed strings We can create character vector with indexed strings like "X1", "X2", "X3", ... etc. as follows:
> labs <- paste( c("X"), 1:20, sep="") > > labs[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10" "X11" "X12" [13] "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20"How does this work?. The portion of command 1:20 creates a vector of sequence 1 to 20 in steps of 1. When this vector sequence is pasted to the single character string "X", each one of the 20 numbers are pasted to it to create the vector elements "X1","X2",.... "X20".
Similarly,
> labs <- paste( c("X","Y"), 1:20, sep="") > > labs[1] "X1" "Y2" "X3" "Y4" "X5" "Y6" "X7" "Y8" "X9" "Y10" "X11" "Y12" [13] "X13" "Y14" "X15" "Y16" "X17" "Y18" "X19" "Y20"
Removing elements from a vector using index A particular element or a set of elements can be removed from a vector by specifying the element index with a negative sign inside square bracket. See below:
> x = c(5,10,15,20,25,30,35,40) > > yr = x[-2] > > yr[1] 5 15 20 25 30 35 40Here, x[-2] has removed the second element of x, which is 10. In order to remove consequitive elements, we give start and end element locations. For example, to remove elements 2,3,4 and 5 from x,> ya = x[-2:-5] > > ya[1] 5 30 35 40Specific elements can be removed by specifying the corresponding indices as a vector inside square bracket:
> x = c(5,10,15,20,25,30,35,40) > > y = x[c(-2,-4,-7)] > > y[1] 5 15 25 30 40