Module: bbcflib.chipseq

This module provides functions to run a basic ChIP-seq analysis from reads mapped on a reference genome. The most important steps are the binding of macs via the ‘add_macs_results’ function and the peak deconvolution algorithm with the ‘run_deconv’ function.

The whole workflow can be run via the function workflow_groups with appropriate options in ‘job.options’.

Below is the script used by the frontend:

from bbcflib import daflims, genrep, frontend, email, common
from bbcflib.mapseq import *
from bbcflib.chipseq import *
M = MiniLIMS( '/path/to/chipseq/minilims' )
ms_limspath = '/path/to/mapseq/minilims'
working_dir = '/path/to/scratch/on/cluster'
hts_key = 'test_key'
assembly_id = 'mm9'
gl = { 'hts_chipseq': {'url': ''},
       'hts_mapseq': {'url': ''},
       'script_path': '/srv/chipseq/lib' }
htss = frontend.Frontend( url=gl['hts_chipseq']['url'] )
job = htss.job( hts_key )
assembly = genrep.Assembly( assembly=assembly_id )
with execution( M, description=hts_key, remote_working_directory=working_dir ) as ex:
    job = get_bam_wig_files( ex, job, ms_limspath, gl['hts_mapseq']['url'], gl['script_path'], via=via )
    files = chipseq_workflow( ex, job, ms_files, assembly, gl['script_path'] )
allfiles = get_files(, M )
print allfiles

Binding for the macs peak caller Takes one (optionally two) bam file(s) and the ‘read_length’ and ‘genome_size’ parameters passed to macs. Returns the file prefix (‘-n’ option of macs)


Runs the ‘’ wrapper script on the ‘scores_fwd’, ‘scores_rev’ and ‘peaks’ input with parameters ‘chromosome_name’ (name of chromosome to process) and ‘read_extension’, using functions from ‘script_path’/deconv_fcts.R. Returns a pdf file and several data tracks.

bbcflib.chipseq.add_macs_results(ex, read_length, genome_size, bamfile, ctrlbam=None, name=None, poisson_threshold=None, alias=None, macs_args=None, via='lsf')[source]

Calls the macs function on each possible pair of test and control bam files and adds the respective outputs to the execution repository.

macs options can be controlled with macs_args. If a dictionary of Poisson thresholds for each sample is given, then the enrichment bounds (‘-m’ option) are computed from them otherwise the default is ‘-m 10,100’.

Returns the set of file prefixes.

bbcflib.chipseq.chipseq_workflow(ex, job_or_dict, assembly, script_path='', logfile=<open file '<stdout>', mode 'w' at 0x2b3dfcadb150>, via='lsf')[source]

Runs a chipseq workflow over bam files obtained by mapseq. Will optionally run macs and ‘run_deconv’.

  • ex – a ‘bein’ execution environment to run jobs in,
  • job_or_dict – a ‘Frontend’ ‘job’ object, or a dictionary with key ‘groups’, ‘files’ and ‘options’ if applicable,
  • assembly – a genrep.Assembly object,
  • script_path – only needed if ‘run_deconv’ is in the job options, must point to the location of the R scripts.

Defaults macs parameters (overriden by job_or_dict['options']['macs_args']) are set as follows:

  • '-bw': 200 (‘bandwith’)
  • '-m': 10,100 (‘minimum and maximum enrichments relative to background or control’)

The enrichment bounds will be computed from a Poisson threshold T, if available, as (min(30,5*(T+1)),50*(T+1)).

Returns a tuple of a dictionary with keys group_id from the job groups, macs and deconv if applicable and values file description dictionaries and a dictionary of group_ids to names used in file descriptions.

bbcflib.chipseq.run_deconv(ex, sql, peaks, chromosomes, read_extension, script_path, via='lsf')[source]

Runs the complete deconvolution process for a set of sql files and a bed file, parallelized over a set of chromosomes (a dictionary with keys ‘chromosome_name’ and values a dictionary with chromosome lengths).

Returns a dictionary of file outputs with keys file types (‘pdf’, ‘sql’ and ‘bed’) and values the file names.

Previous topic

Module: bbcflib.mapseq

Next topic

Module: bbcflib.rnaseq

This Page