rma - An R Package for Microarray Analysis
Tags: microarrays, R-Project, Academics.
This is the homepage for the R rma package. rma is a package for the R open-source statistics and data analysis software, developed to aid the analysis of microarray experimental data.
1. Introduction
This package was born from the necessity to analyse microarray data in a flexible and customizable manner, in order to tackle the practical problem at hand on the Molecular Cardiology Laboratory of the Heart Institute (InCor - Instituto do Coração) - USP - Brazil.
The project at InCor is funded by Fapesp, and so were most of the tools developed here.
2. Main Features
- Database support through Perl DBI and R DBI, which means it can easily be used with many different databases applications.
- Objected-oriented approach with classes for spots, sets of spots and sets of intensities.
- Extensive support for various normalization techniques.
- Support to read the array image raw data files.
3. Pre-requisites
Pre-requisites for running the package, including the filters for data intensities and spot identification are:
- R >= 1.8.0
- RMySQL
- MySQL >= 4.0
- Perl >= 5.6
- Perl::Dbi module
- Perl::Dbd:MySQL module
- ImageMagick
4. Obtaining the package and the documentation
The source package can be obtained here. The automatic generated documentation can be found here. Installation is pretty straightforward and follows the basic procedures for installing a source package. Basically you have to download the source package and issue:
5. Creating the databases
The first thing in order to be able to fully use the package, is to set up the databases needed to start the analysis. First, you have to create a 'microarray' database in MySQL.
Then, after setting up the adequate permissions we create two tables inside this database:
mysql> CREATE TABLE lamina (name VARCHAR(50) PRIMARY KEY,pixel_size INT(5),x_origin INT(10),y_origin INT(10),type VARCHAR(100),ratio_form VARCHAR(100));
mysql> CREATE TABLE add_info (name VARCHAR(50),reciproca VARCHAR(50),cy3_desc VARCHAR(200),cy5_desc VARCHAR(200),cy3_control INT(1),lamina VARCHAR(50));
1. Populating the databases with data from experiments
After the initial tables were created, one can start populating the database using the scripts provided in the exec/ directory. To insert into the database a file in the GIPO format (Gene Assession List), one should use the rma_gipo_parser.pl script. Let's illustrate its syntax with an example. Imagine we want to insert the file 11ka.gal and we want the reference name 'noruega'. We would run the script in the following fashion:
To read intensity files generated by GenePix (GPR files), we have the script rma_gpr_parser.pl. Again using an example, let's say we want to insert the data file for the hybrididization rat85, which is saved on the file rat85.gpr. We would type:
Similarly there's the script rma_dapple_parser.pl, used to insert intensity files generated by Dapple. Its use is identical to the previous one here presented.
Some other information may also be inserted into the database, specially into the table 'add_info', to add information on issues like which samples of mRNA were fixed on which arrays and what are the dye-swaps. This can be done directly in the MySQL shell, as in:
6. Sample session
Assuming the package was successfully installed and the databases created, a sample session would start with:
We could then load raw data for a given array, and do some preliminar descriptive analysis:
> maplot(rat85.raw)
> boxplot.pg(rat85.raw)
This would create the two graphics shown below. Assuming you had an array 'rat85' in your database.


It is also possible to work with individual spots, using the spot class and related functions. We first create a spot on a given array:
Issuing a summary() on it gives:
Spot summary information
array: rat88 subarray: 30 row: 10 column: 1
gene_id: Rn.9436
description: ESTs, Highly similar to (defline not available 6180013) [H.sapiens]
Corrected intensities (foreground - background):
Mean Median Description
Cy 3 29381 32901 SHR sal
Cy 5 19074 19074 2c sal
We could also get information on the replicates:
Spot summary information
array: rat88 subarray: 30 row: 9 column: 6
gene_id: Rn.9436
description: ESTs, Highly similar to (defline not available 6180013) [H.sapiens]
Corrected intensities (foreground - background):
Mean Median Description
Cy 3 29442 32776 SHR sal
Cy 5 19426 19426 2c sal
Or check the spot's image on the array:

We could also try the different normalization techniques implemented. For instance:
> maplot(rat88.global)
Obtaining:

Many other functions are available for lots of normalization procedures, as well as for identifying set of differently expressed genes and manipulating sets of expressions.
7. License
This package is released under the GNU General Public License 2.0, available here http://www.gnu.org/licenses/gpl.html. Redistributions of this package should be done under this license also.
*Project funded by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo). Grant No 03/02074-0.

Creation date: 2005-04-12.
