Module: bbcflib.snp

From a set of BAM files produced by an alignement on the genome, calls snps and annotates them with respect to a set of coding genes on the same genome.

bbcflib.snp.parse_vcf(vcf_line)[source]

cf. http://samtools.sourceforge.net/mpileup.shtml

bbcflib.snp.all_snps(*args, **kwargs)[source]

For a given chromosome, returns a summary file containing all SNPs identified in at least one of the samples. Each row contains: chromosome id, SNP position, reference base, SNP base (with proportions)

Parameters:
  • chrom – (str) chromosome name.
  • vcfs – (dict) vcf files for each sample, dictionary keys are group ids.
  • bams – (dict) bamfiles organized like the vcf files.
  • outall – (str) name of the file that will contain the list of all SNPs.
  • assembly – (genrep.Assembly) assembly for the fasta files and ploidy value.
  • headerfile – (string) name of file with substitute bam header to match the fasta files.
  • sample_names – (list of str) list of sample names.
  • mincov – (int) minimum number of reads supporting an SNP at a position for it to be considered. [5]
  • minsnp – (int) minimum percentage of reads supporting the SNP for it to be returned. N.B.: Effectively, half of it on each strand for diploids. [40]
bbcflib.snp.exon_snps(*args, **kwargs)[source]

Annotates SNPs described in filedict (a dictionary of the form {chromosome: filename} where filename is an output of parse_pileupFile). Adds columns ‘gene’, ‘location_type’ and ‘distance’ to the output of parse_pileupFile. Returns two files: the first contains all SNPs annotated with their position respective to genes in the specified assembly, and the second contains only SNPs found within CDS regions.

Parameters:
  • chrom – (str) chromosome name.
  • outexons – (str) name of the file containing the list of SNPs on exons.
  • allsnps – list of tuples (chr,start,end,ref,alt1..altN) as returned by all_snps(). Ex: [(‘chr’, 3684115, 3684116, ‘G’, ‘G’, ‘G’, ‘G’, ‘T (56% of 167)’, ‘G’), ...]
  • assembly – genrep.Assembly object
  • sample_names – list of sample names.
  • genomeRef – dict of the form {‘chr1’: filename}, where filename is the name of a fasta file containing the reference sequence for the chromosome.
bbcflib.snp.create_tracks(ex, outall, sample_names, assembly)[source]

Write BED tracks showing SNPs found in each sample.

bbcflib.snp.snp_workflow(*args, **kwargs)[source]

Main function of the workflow

Previous topic

Module: bbcflib.c4seq

Next topic

Module: bbcflib.microbiome

This Page

Websites