Mapping

This is a short description of the options available in the mapping interface application, which is practically an interface to the Bowtie software with a few additional post-processing options as described in Leleu M, Lefebvre G, Rougemont J, Brief Funct Genomics, 2010.

Provide Reads

Reads files can be specified in several different ways. First they can be obtained as the output of a demultiplexing job by specifiying the corresponding key.

For non-demultiplexed jobs, the original Fastq file must be provided either as a reference to the sequencing facility LIMS, or by a URL (which can be a complete file path on the server-side filesystems).

_images/mapseq_newjob.png

Reads files are organized by groups (experimental conditions) and runs (replicates). Each group must be given a name that will be used in output file names and reports to reference them. Make sure to use short names without spaces (prefer “_” character to separate words) and without any special characters in it (e.g, %&?!;,)

Paired-end Reads

If reads are provided via a reference to the LIMS or via an SRA link, paired-end mapping will be automatically configured. If they are provided as file uploads, they must be given as a comma-separated pair of fastq files: sample_R1.fastq,sample_R2.fastq.

Choose your reference

_images/mapseq_options.png

Reads will be mapped to a reference sequence set, which is identified by an assembly (species) and a data type which can be either of:: * genome (full chromosome sequences) * transcriptome (annotated exon sequences from Ensembl)

The corresponding Bowtie indexes have been generated by the Genrep application from reference sequences from NCBI and Ensembl.

Job description

Please give a name to your analysis that can be refered to later, in particular in the email that will be sent to the address provided:

_images/4Cseq_newJob2.png

Bowtie options

Default Bowtie2 options are:

``--end-to-end --sensitive -k 20``

Through a config file, one can force using Bowtie1 instead (see config files). Default Bowtie1 options are:

``--best --strata --chunkmbs 512 -Sam 20``

Output is converted to BAM (file sampleName_complete.bam) then filtered to retain only the mapped reads with at most 5 hits in the reference. The number of hits for each reads is indicated in the BAM file with NH field (first read has 1 hit, second has 3):

>samtools view sampleName_filtered.bam
R2D2_0060:5:76:1358:1657#0/1    0       chr1     3000122 255     37M     *       0       0       TGTCTTTACCTTATTTGTTCTAAATTTTTTGCAAACT   BCACCBCCCCCCBCCCBBBCCCC=CCCCBBBCCC?BC   XA:i:0  MD:Z:37 NM:i:0  NH:i:1
R2D2_0060:5:29:882:31#0/1       16      chr1     3000219 255     37M     *       0       0       GCATTGGTTAAATGGAAGGACCAGCTGACTAAGGAAT   7%5A8=A@@>@ABBA@A@B9;:'=ABB@>BBCBCBBB   XA:i:1  MD:Z:8A13T14    NM:i:2  NH:i:3

If the Discard PCR duplicates option is given, only at most n reads per strand-specific genomic position will be kept, where n is computed as the 95% percentile of a Poisson distribution with the same mean as the expected genome coverage. These filtered hits will be provided as the sampleName_filtered.bam.

Mapping report

Each mapping run will generate a mapping report displaying some general statistics about the Bowtie mappings (number of mismatches compared to reference, number of multiple hits, forward/reverse strand balance).

Densities output

For reads mapped to the genome, a read density file can be generated. This will provide genomic position-specific reads counts normalized by total number of reads (in units of 10 millions), where each multiple mapping read is counted as 1/total count. Reads are extended to a specified length (by the default the read length of the sequencing run). If merge strands is specified as NA, two strand-specific densities are produced, if a number S is given as merge strands value, then a single density is calculated as the average of the two strands after shifting each by S bases in downstream direction. n*S* should therefore correspond to half the difference between average sequencing fragment length et read extension.

These densities are available in the results page as sqlite and bigwig formats. If Create GDV project is specified, the files will be uploaded in a new project on GDV.

Table Of Contents

Previous topic

Demultiplexing

Next topic

ChIP-seq

This Page

Websites