RNA-seq

Here is a short tutorial showing how to launch a RNA-seq analysis from HTSstation’s web interface.

New Job

An RNA-seq analysis works from reads aligned on a reference genome, given as BAM file(s) through the BAM URL field (there is one BAM file per run). Each BAM file represents a sample (“run”); several samples that were produced in the same conditions (replicates) form a “group”.

The BAM URLs can be given directly as an http:// or ftp:// address accessible from outside. You can add manually as many groups and as many runs per group you want by using the links Add group of runs and Add run in this group. Each sample will then be labeled group_name.run_index in the output files. Make sure to use short group names, without spaces (prefer the “_” character to separate words) and without any special character in it (e.g. “%&?!”). Groups are considered to represent different experimental conditions, while runs typically represent techical replicates. This is important for statistical analysis - see below.

If you used the HTSstation mapping module to do the mapping, you can copy the 20-random characters keys obtained as a result into the Mapping key field, and validate using the link Add data from Mapping. In such case, all relevant fields will be automatically filled in (see tutorial of our mapping module for more details about those fields). To add samples from other independant mappings, successively enter the correponding keys and click on Add data from Mapping.

_images/RNAseq_newjob.png

Then select an assembly from the list. Make sure you are selecting the one that was used for the mapping. If your assembly is not listed, please send us an email.

Name your analysis in the Analysis description field. Preferably use short names, without any special characters (e.g. “%&?!”). Submit your e-mail in order to receive a message upon completion of the pipeline.

_images/RNAseq_generals.png

Finally, click on the Create button and confirm to launch the job.

_images/RNAseq_create.png

Results

When the job finishes successfully, you will receive an e-mail with a link to the page where you can download the results. Results consist in tab-delimited files containing counts and rpkm for genes and transcripts, and a differential expression analysis for each pair of groups in the experiment.

Counts tables, named “<type>_expression.tab”, contain columns named “<prefix>.<sample>.<run_id>”, where <prefix> is “counts.” for raw counts, or “rpkm.” for transcript-size normalized counts. Genomic features with zero counts in all conditions will not be reported.

_images/RNAseq_output_counts.png

Note

Because it takes into account multiply mapping reads and because of the method used, “counts” may not be always integer-valued, although they still represent a fraction of the library mapping to the region.

Differential analysis is performed on raw counts (by DESeq) and results are summarized in the files named “<type>_differential_<comparison>.txt”. Columns “pval” and “padj” are respectively gene-level- and adjusted p-values (correction for multiple testing).

_images/RNAseq_output_diff.png

Warning

Differential expression analysis will not be very reliable if there are no replicates (i.e. only one run per group): in this case all groups will be pooled and the variation between them considered as background biological variability. Prefer fold changes over p-values if you have no or very few replicates.

PCA

A useful diagnostic tool is PCA, allowing to see for instance if replicates cluster well together, or if groups with a different treatment can be distinguished in the experiment. The module provides a standard “biplot” showing the 2-dimensional projection of groups relative to their global gene expression (in log RPKM), and a diagnostic of the PCA itself: a bar plot of the loadings - the two first bars together must account for a large part of the total.

_images/RNAseq_pca_combined.png

Interactive MA-plot

From there you can also create an interactive MA-plot to look for differential transcript expression.

  1. Select the type of genomic features you want to compare (level);
  2. Select the type of normalization to apply to the data (raw for untransformed count data, or RPKM);
  3. Select the two samples you want to compare from you data (Choose runs to compare checkboxes);
  4. Click on the Compute button.
_images/RNAseq_create_maplot.png

On the graph’s page, click on a point you are interested in to display its name in the column on the right. Click on it again to remove it from the list. Click on the name to get information about the selected feature from Ensembl. Note that the graph may take a long time to load and react if there are a lot of features to draw.

_images/RNAseq_maplot.png

Functionalities:

  • If you select several runs from the same group, they will be averaged and considered as a single sample in the MA-plot. For instance, one can select KO.1, KO.2, KO.3, WT.1 and WT.2 to compare groups KO and WT.
  • Use the Zoom in and Zoom out buttons to zoom on the graph. You can retrieve the original view with the Reset button. Translate the graph by holding the mouse button and sliding the figure.
  • Use the Search field to retrieve in the plot a gene given by its name or ID.
  • Use the Select surrounding features fields to highlight all genes nearest to the last selection.
  • Use the Clear button to remove all selections.

Table Of Contents

Previous topic

4C-seq

Next topic

SNP

This Page

Websites