This module will separate barcoded reads into individual files and prepare the reads for further analysis (e.g., remove barcode and truncate the reads up to a given length). This version works for single-end data in fastq format.
For details about the underlying method, please refer to the mutliplex method page (in preparation)
The fastq file can be retrieved directly from the sequencing facility (see general explanation here) or through a given URL (as a http:// or ftp:// address accessible from outside). The image bellow gives an example for multiple groups and runs of sequences.
Defining several groups (with Add group of runs) means that the demultiplexing will be done separately for each group of runs. When a group is composed by several runs (defined with Add run in this group), then all the reads coming from the different datasets will be merged up and considered as a single fastq file.
Do not forget to name each group. This name is used for naming the results files as well as in the reports. Make sure to use short names without spaces (prefer “_” character to separate words) and without any special characters in it (e.g, %&?! ...)
Name your analysis. Please, use short names, without spaces (prefer “_” character to separate words) and without any special characters (e.g., %&?! ... ). Finally, submit the relevant information to receive an email upon completion of the pipeline.
The demultiplexing process requires one parameter file as well as one primers file per group. Basically, the parameter file contains options for the demultiplexing itself, while the primers file gives details about the primer design. See below for more details about each file.
The current method underlying the demultiplexing process is based on Exonerate, a tool which returns all the pairwise alignments that are above a given threshold (defined by minScore). Here is the command line used for this step:
exonerate --model affine:local -o -4 -e -12 -s minScore
As the alignment score depends on the length of the sequences being aligned, we recommand to adjust it to your situation (see Exonerate Manual for more details). In the following example, the score has been set up for a sequence of length 22.
In addition, to maximize the accuracy of the results, we limit the alignments to the first bps of each reads. This is defined by the two first parameters (Search the primer... ). We also limit the lengths of the barcodes to search and advise to keep the same length for all. This is defined in the primer file (see description below).
The last parameter defines the length of the sequence for each read that will be kept for further analysis, after having removed the barcode.
Search the primer from base i (-n)=2 Search the primer in the next n bps of the reads [i to i+n] (-x 22)=22 Minimum score for Exonerate (-s 77)=77 Length of the reads to align (-l 30)=30
To ensure the use of consistent formatting, we advise to use the following template
A primer file is a fasta file containing information about each barcode supposedly present in the reads. Below is an example of such file:
>HoxD13|AAAATCCTAGACCTGGTCATG|chr2:74504332-74506317|CATG|CATGGTCAAATTCAAACCCGGAGGGTCTCTCCAGGTTTTT|AAAAACCTGGAGAGACCCTCCGGGTTTGAATTTGACCATG|CATGGCGCGCTGCGCCTCCTCCCTCCTCGCTGTGTTCCGC|GCGGAACACAGCGAGGAGGGAGGAGGCGCAGCGCGCCATG|CATGACCAGGTCTAGGATTTTTAAAAGTTATACAAATTCT|AGAATTTGTATAACTTTTAAAAATCCTAGACCTGGTCATG|Exclude=chr2:74501237-74508317 AAAATCCTAGACCTGGTCA >HoxD4|AGGACAATAAAGCATCCATAGGCGACATG|chr2:74561329-74562566|CATG AGGACAATAAAGCATCCAT
The header contains information about each individual primer. The sequence is the primer sequence previously used during the de-multiplexing.
Fields must be separated by the character “|” (pipe - usually Alt+7) without spaces in between, and order should be respected.