Microarrays have been used for the past thirty years to measure the simultaneous expression of thousands of genes in a genome sample.
The amount of mRNA expressed by individual genes in a given condition are measured by various types of microarrays like gene expression arrays, exon arrays, tiling arrays, SNP arrays, ChIPseq arrays and many more.
The first draft of the human genome sequence created by the Human Genome Project in the year 2000 made it possible to create
microarrays that could study the expressions of thousands of identified human genes from a sample with a single hybridisation.
The microarray experiments resulted in enormous amount of data consisting of expression levels of tens and thousands of genomic regions from multiple number of samples under various conditions. Complicated programming methods, statistical algorithms and visualization methods were immediately brought in to extract meaningful information from this large data.
The advent of microarray technology opened up a new field called "Bioinformatics" which very soon branched into various fields of "omics" studies namely "Genomics", "Transcriptomics", "Proteomics" and "Metabolomics" aided by different technologies that followed the microarrays. In the last two decades, results from tens of thousands of microarray studies have pushed the frontiers of our understanding on gene expression, protein interactions, protein interaction networks, molecular mechanisms of diseases and drug effects.
For the last fifteen years, the Next generation Sequencing (NGS) technology is providing a very accurate way of
performing the DNA sequencing of a genome, expression analysis through RNAseq technology, SNP detection and many more tasks. This technology is a powerful replacement to microarrays for gene expression analysis. In the last 10 years, more and more RNAseq experiments are used for gene expression analysis. The recently developed single cell RNA sequencing technology is the latest addition to this advancement.
Gene Expression Omnibus (GEO) is an open source data repository maintained by National Institute of Health (NIH), United States. It contains tens of thousands of raw and processed data sets obtained using hybridization arrays, chips and microarrays from tens and thousand of expreiments throughout the world. Even after the dominance of NGS technology over the microarrays, many data sets from new experiments are added every day.
These data sets span over hundreds of disease studies, and is a treasure trove for researchers who can re-analyze these data sets from their own research perspectives of a given disease and get new insights. Microarrays are still used by various research groups, hospitals and pharma industry for important studies.
Innumerable number of tools, both commercial as well as open source have been developed and used for microarray data analysis.
One of the important set of microarray analysis tools are found in the open source R/Bioconductor framework, developed and maintained by a vast academic community.
The main purpose of this section on microarray analysis is to provide a brief and essential introduction to the microarray technology and
easy to learn hands on tutorials on microarray data analysis using the R/Bioconductor framework. The user who already has a familiarity in
using R language and basic statistics will have a headstart in this learning process. After learning these ideas, the user will
be able to download specific data sets of microarray experments from GEO data base and will be able to perform the expression
analysis independently. This is our aim.
Many ideas and concepts used in the mathematical analysis of NGS data originated in the microarray data analysis. Therefore, from an
education point of view, it is very useful to start the expression analysis from microarrays before proceeding to NGS data analysis.
Among the large number of different microarray chips that exist in the market, we have covered only few dominant ones in this tutorials.
This is due to the limited knowledge and experience of the author of these tutorials in this field. In future, efforts will be made to include the methods of analysis of many more types of microarray chips whose data have been deposited in GEO.
This tutorial assumes that the user is, to the possible extent, is well versed in the fundamentls of experimental and theoretical concepts of biology/biochemitry/bioinformatics related to the Gene expression measurements. Throught the tutorial, standard terms like DNA, RNA, transcription, translation, transcript, gene and hybridization will be used under the assumption that the user is familiar with the subject.