### R Fundamentals

##### Data frames

Dataframe is a data structure similar to matrix, with a special feature that different columns can have different data types.

Dataframe is very useful for combining vectors of same length with different data types into a single data structure

Similar to matrices, all the columns of a data frame should have same number of rows.

###### Creating a data frame from vectors

A data frame is made up of individual vectors of same length placed as columns. We can easily create a data frame from vectors using data.frame() function. We just have to pass the vector names as parameters to this function.

In the example below, we create a data frame called "frm1" with three vectors namely "data1", "data2" and "data3". The created data frame will have columns named "data1", "data2" and "data3":


>   data1 <- c("Iron","Sulphur","Calcium", "Magnecium", "Copper")
>   data2 <- c(12.5, 32.6, 16.7, 20.6, 7.5)
>   data3 <- c(1122, 1123, 1124, 1125, 1126)
>
>   frm1 <- data.frame(data1, data2, data3)
>
>  frm1


data1 data2 data3
1      Iron  12.5  1122
2   Sulphur  32.6  1123
3   Calcium  16.7  1124
4 Magnecium  20.6  1125
5    Copper   7.5  1126


In the above example, note that the column names of the data frame 'frm1' we created are just the names of the objects themselves. A sequence of indices 1,2,3,4 and 5 have been added as row names, by default.

###### Get the row and column names of a data frame

To get the column names of a data frame, call names() function with frame name as parameter. This function returns the column names as a vector of strings:


>  names(frm1)

[1] "data1" "data2" "data3"

We can also get the column and row names of a data frame using rownames() and colnames() funtions:


>  rname = rownames(frm1)
>
>  rname

[1] "1" "2" "3" "4" "5"
>

>  cname = colnames(frm1)
>
>  cname

[1] "data1" "data2" "data3"
###### Name the rows and columns of a data frame

The columns of a data frame can be named explicitly using a vector of strings. For the above frame "frm1", we can set the column names with our own vector of strings.


>  names(frm1) <- c("Element", "Proportion", "Product_ID")
>
>  frm1


Element Proportion Product_ID
1      Iron       12.5      1122
2   Sulphur       32.6      1123
3   Calcium       16.7      1124
4 Magnecium       20.6      1125
5    Copper        7.5      1126


In the above example, we can use colnames(frm1) instead of names(frm1) . Both commands create the same result.

Similarly, the row names can be initialized by a vector of strings:


>  rownames(frm1) = c("elmt-1","elmt-2","elmt-3","elmt-4","elmt-5")
>
>  frm1


Element Proportion Product_ID
elmt-1      Iron       12.5       1122
elmt-2   Sulphur       32.6       1123
elmt-3   Calcium       16.7       1124
elmt-4 Magnecium       20.6       1125
elmt-5    Copper        7.5       1126

###### Accessing the elements of a data frame by index

The elements of a Data frame are accessed using same subscript convention as matrices. Thus, frm1[1,3] is the element in first row third column, frm1[1,] is entire first row, frm1[,2] is entire second column. Also, frm1[1:3,] gives the rows 1,2 and 3. This is illustrated here using the frame name frm1 created above:


>  frm1[1,3]

[1] 1122


>  frm1[1,]

Element Proportion Product_ID 1 Iron 12.5 1122


>  frm1[,2]

[1] 12.5 32.6 16.7 20.6 7.5


>  frm1[1:3,]


Element Proportion Product_ID
1    Iron       12.5       1122
2 Sulphur       32.6       1123
3 Calcium       16.7       1124

###### Accessing a column of a data frame by name

We can also access a column of a dataframe by its name, by typing the frame name and the column names separated by a '$' sign. The accessed column is treated as a vector. For example, columns of the data frame 'frm1' can be accessed by their names as shown here:  > frm1$Element

[1] Iron Sulphur Calcium Magnecium Copper Levels: Calcium Copper Iron Magnecium Sulphur

>  frm1$Proportion  [1] 12.5 32.6 16.7 20.6 7.5  > frm1$Product_ID

[1] 1122 1123 1124 1125 1126

>  1000*frm1$Proportion  [1] 12500 32600 16700 20600 7500 ###### Adding a new column to the data frame We can add a new column to the existing data frame by creating a vector and naming it as a new column of the frame. Obviously, this vector should have same length as the number of rows of the existing frame. We will add a new column called "symbol" to the existing frame "frm1":  > frm1$symbol = c("Fe","S","Ca","Mg","Cu")
>
>  frm1


Element Proportion Product_ID symbol
1      Iron       12.5       1122     Fe
2   Sulphur       32.6       1123      S
3   Calcium       16.7       1124     Ca
4 Magnecium       20.6       1125     Mg
5    Copper        7.5       1126     Cu

###### Removing a column by name from a data frame

A column can be removed from a data frame by accessing it by name and assigning NULL value to it. In the following example, we will access the column named "Product-ID" from frane "frm1" and remove it:


>  frm1


Element Proportion Product_ID symbol
elmt-1      Iron       12.5       1122     Fe
elmt-2   Sulphur       32.6       1123      S
elmt-3   Calcium       16.7       1124     Ca
elmt-4 Magnecium       20.6       1125     Mg
elmt-5    Copper        7.5       1126     Cu


>
>  frm1$Product_ID <- NULL > > frm1   Element Proportion symbol elmt-1 Iron 12.5 Fe elmt-2 Sulphur 32.6 S elmt-3 Calcium 16.7 Ca elmt-4 Magnecium 20.6 Mg elmt-5 Copper 7.5 Cu  ###### To attach a data frame We learnt to access a column of a data frame by mentioning the column name along with the frame name separated by '$' sign. When there are more than one data frame in memory with same column names(s), this format can distinguish between them. Suppose we have a situation when we do not have this naming conflict. In this case it will more convenient to access the column by mentioning only its name, dropping the frame name. We use attach() command for this.

The attach() function attaches a database to the R search path, so that the objects in the data base can be accessed by simply giving their names.

As an example, for the data frame called "frm1" created before, we will first access the column named "symbol" directly. It fails. However, after attaching the frame with the command attach(frm1) , we can access "symbol" column directly by its name:


>  frm1
Element Proportion symbol
elmt-1      Iron       12.5     Fe
elmt-2   Sulphur       32.6      S
elmt-3   Calcium       16.7     Ca
elmt-4 Magnecium       20.6     Mg
elmt-5    Copper        7.5     Cu
>
>  symbol