Subpackage: bbcflib.bein

bein – LIMS and workflow manager for bioinformatics

Bein contains a miniature LIMS (Laboratory Information Management System) and a workflow manager. It was written for the Bioinformatics and Biostatistics Core Facility of the Ecole Polytechnique Federale de Lausanne. It is aimed at processes just complicated enough where the Unix shell becomes problematic, but not so large as to justify all the machinery of big workflow managers like KNIME or Galaxy.

This module contains all the core logic and functionality of bein.

There are three core classes you need to understand:

execution
The actual class is Execution, but it is generally created with the execution contextmanager. An execution tracks all the information about a run of a given set of programs. It corresponds roughly to a script in shell.
MiniLIMS
MiniLIMS represents a database and a directory of files. The database stores metainformation about the files and records all executions run with this MiniLIMS. You can go back and examine the return code, stdout, stderr, imported files, etc. from any execution.
program
The @program decorator provides a very simple way to bind external programs into bein for use in executions.

Executions

bein.execution(*args, **kwds)[source]

Create an Execution connected to the given MiniLIMS object.

execution is a contextmanager, so it can be used in a with statement, as in:

with execution(mylims) as ex:
    touch('boris')

It creates a temporary directory where the execution will work, sets up the Execution object, then runs the body of the with statement. After the body finished, or if it fails and throws an exception, execution writes the Execution to the MiniLIMS repository and deletes the temporary directory after all is finished.

The Execution has field id set to None during the with block, but afterwards id is set to the execution ID it ran as. For example:

with execution(mylims) as ex:
    pass

print ex.id

will print the execution ID the with block ran as.

On some clusters, such as VITAL-IT in Lausanne, the path to the current directory is different on worker nodes where batch jobs run than on the nodes from which jobs are submitted. For instance, if you are working in /scratch/abc on your local node, the worker nodes might mount the same directory as /nfs/boris/scratch/abc. In this case, running programs via LSF would not work correctly.

If this is the case, you can pass the equivalent directory on worker nodes as remote_working_directory. In the example above, an execution may create a directory lK4321fdr21 in /scratch/abc. On the worker node, it would be /nfs/boris/scratch/abc/lK4321fd21, so you pass /nfs/boris/scratch/abc as remote_working_directory.

class bein.Execution(lims, working_directory)[source]

Execution objects hold the state of a current running execution.

You should generally use the execution function below to create an Execution, since it sets up the working directory properly.

Executions are run against a particular MiniLIMS object where it records all the information onf programs that were run during it, fetches files from it, and writes files back to it.

The important methods for the user to know are add and use. Everything else is used internally by bein. add puts a file into the LIMS repository from the execution’s working directory. use fetches a file from the LIMS repository into the working directory.

Execution.add(filename, description='', associate_to_id=None, associate_to_filename=None, template=None, alias=None)[source]

Add a file to the MiniLIMS object from this execution.

filename is the name of the file in the execution’s working directory to import. description is an optional argument to assign a string or a dictionary to describe that file in the MiniLIMS repository.

Note that the file is not actually added to the repository until the execution finishes.

Execution.use(file_or_alias)[source]

Fetch a file from the MiniLIMS repository.

fileid should be an integer assigned to a file in the MiniLIMS repository, or a string giving a file alias in the MiniLIMS repository. The file is copied into the execution’s working directory with a unique filename. ‘use’ returns the unique filename it copied the file into.

MiniLIMS

class bein.MiniLIMS(path)[source]

Encapsulates a database and directory to track executions and files.

A MiniLIMS repository consists of a SQLite database and a directory of the same name with .files appended where all files kept in the repository are stored. For example, if the SQLite database is /home/boris/myminilims, then there is a directory /home/boris/myminilims.files with all the corresponding files. You should never edit the repository by hand!.

If you create a MiniLIMS object pointing to a nonexistent database, then it creates the database and the file directory.

Basic file operations:
Fetching files and executions:
Deleting files and executions:
Searching files and executions:
File aliases:
File associations:
MiniLIMS.add_alias(fileid, alias)[source]

Make the string alias an alias for fileid in the repository.

An alias can be used in place of an integer file ID in all methods that take a file ID.

MiniLIMS.associate_file(file_or_alias, associate_to, template)[source]

Add a file association from file_or_alias to associate_to.

When the file associate_to is used in an execution, file_or_alias is also used, and named according to template. template should be a string containing %s, which will be replaced with the name associate_to is copied to. So if associate_to is copied to X in the working directory, and the template is "%s.idx", then file_or_alias is copied to X .idx.

MiniLIMS.associated_files_of(file_or_alias)[source]

Find all files associated to file_or_alias.

Return a list of (fileid, template) of all files associated to file_or_alias.

MiniLIMS.copy_file(file_or_alias)[source]

Copy the given file in the MiniLIMS repository.

A copy of the file corresponding to the given fileid is made in the MiniLIMS repository, and the file id of the copy is returned. This is most useful to create a mutable copy of an immutable file.

MiniLIMS.delete_alias(alias)[source]

Delete the alias alias from the repository.

The file itself is untouched. This only affects the alias.

MiniLIMS.delete_execution(execution_id)[source]

Delete an execution from the MiniLIMS repository.

MiniLIMS.delete_file(file_or_alias)[source]

Delete a file from the repository.

MiniLIMS.delete_file_association(file_or_alias, associated_to)[source]

Remove the file association from file_or_alias to associated_to.

Both fields can be either an integer or an alias string.

MiniLIMS.export_file(file_or_alias, dst, with_associated=False)[source]

Write file_or_alias from the MiniLIMS repository to dst.

dst can be either a directory, in which case the file will have its repository name in the new directory, or can specify a filename, in which case the file will be copied to that filename. Associated files will also be copied if with_associated=True.

MiniLIMS.fetch_execution(exid)[source]

Returns a dictionary of all the data corresponding to the given execution id.

MiniLIMS.fetch_file(id_or_alias)[source]

Returns a dictionary describing the given file.

MiniLIMS.import_file(src, description='')[source]

Add an external file src to the MiniLIMS repository.

src should be the path to the file to be added. description is an optional string or dictionary that will be attached to the file in the repository. import_file returns the file id in the repository of the newly imported file.

MiniLIMS.path_to_file(file_or_alias)[source]

Return the full path to a file in the repository.

It is often useful to be able to read a file in the repository without actually copying it. If you are not planning to write to it, this presents no problem.

MiniLIMS.resolve_alias(alias)[source]

Resolve an alias to an integer file id.

If an integer is passed to resolve_alias, it is returned as is, so this method can be used without worry any time any alias might have to be resolved.

MiniLIMS.search_executions(with_text=None, with_description=None, started_before=None, started_after=None, ended_before=None, ended_after=None, fails=None)[source]

Find executions matching the given criteria.

Returns a list of execution ids of executions which satisfy all the criteria which are not None. The criteria are:

  • with_text: The execution’s description or one of the program arguments in the execution contains with_text.
  • with_description: The execution’s description contains with_description.
  • started_before: The execution started running before start_before. This should be of the form “YYYY-MM-DD HH:MM:SS”. Final fields can be omitted, so “YYYY” and “YYYY-MM-DD HH:MM” are also valid date formats.
  • started_after: The execution started running after started_after. The format is identical to started_before.
  • ended_before: The execution finished running before ended_before. The format is the same as for started_before.
  • ended_after: The execution finished running after ended_after. The format is the same as for started_before.
  • fails: If ‘False’, returns only executions that didn’t encounter any error, i.e. execution.exception not null. If ‘True’, returns only executions with errors. Warning: any try/except block inside an execution may cause execution.exception not to be null without making fail the script itself.
MiniLIMS.browse_executions(with_text=None, with_description=None, started_before=None, started_after=None, ended_before=None, ended_after=None)[source]

Prints and returns a set of tuples (ID, description, started at, finished at), one for each execution corresponding to the request. See documentation for search_executions().

MiniLIMS.search_files(with_text=None, with_description=None, older_than=None, newer_than=None, source=None)[source]

Find files matching given criteria in the LIMS.

Finds files which satisfy all the criteria which are not None. The criteria are:

  • with_text: The file’s external_name or description contains with_text
  • with_description: The file’s description contains with_description
  • older_than: The file’s created time is earlier than older_than. This should be of the form “YYYY-MM-DD HH:MM:SS”. Final fields can be omitted, so “YYYY” and “YYYY-MM-DD HH:MM” are also valid date formats.
  • newer_than: The file’s created time is later than newer_then, using the same format as older_than.
  • source: Where the file came from. Can be one of 'execution', 'copy', 'import', ('execution',exid), or ('copy',srcid), where exid is the numeric ID of the execution that created this file, and srcid is the file ID of the file which was copied to create this one.

Programs

class bein.program(gen_args)[source]

Decorator to wrap external programs for use by bein.

Bein depends on external programs to do most of its work. In this sense, it’s a strange version of a shell. The @program decorator makes bindings to external programs only a couple lines long.

To wrap a program, write a function that takes whatever arguments you will need to vary in calling the program (for instance, the filename for touch or the number of seconds to sleep for sleep). This function should return a dictionary containing two keys, 'arguments' and 'return_value'. 'arguments' should point to a list of strings which is the actual command and arguments to be executed (["touch",filename] for touch, for instance). 'return_value' should point to a value to return, or a callable object which takes a ProgramOutput object and returns the value that will be passed back to the user when this program is run.

For example, to wrap touch, we write a one argument function that takes the filename of the file to touch, and apply the @program decorator to it:

@program
def touch(filename):
    return {"arguments": ["touch",filename],
            "return_value": filename}

Once we have such a function, how do we call it? We can call it directly, but @program inserts an additional argument at the beginning of the argument list to take the execution the program is run in. Typically it will be run like:

with execution(lims) as ex:
    touch(ex, "myfile")

where lims is a MiniLIMs object. The ProgramOutput of touch is automatically recorded to the execution ex and stored in the MiniLIMS. The value returned by touch is "myfile", the name of the touched file.

Often you want to call a function, but not block when it returns so you can run several in parallel. @program also creates a method nonblocking which does this. The return value is a Future object with a single method: wait(). When you call wait(), it blocks until the program finishes, then returns the same value that you would get from calling the function directly. So to touch two files, and not block until both commands have started, you would write:

with execution(lims) as ex:
    a = touch.nonblocking(ex, "myfile1")
    b = touch.nonblocking(ex, "myfile2")
    a.wait()
    b.wait()

By default, nonblocking runs local processes, but you can control how it runs its processes with the via keyword argument. For example, on systems using the LSF batch submission systems, you can run commands via batch submission by passing the via argument the value "lsf":

with execution(lims) as ex:
    a = touch.nonblocking(ex, "myfile1", via="lsf")
    a.wait()

You can force local execution with via="local".

Some programs do not accept an output file as an argument and only write to stdout. Alternately, you might need to capture stderr to a file. All the methods of @program accept keyword arguments stdout and stderr to specify files to write these streams to. If they are omitted, then both streams are captured and returned in the ProgramOutput object.

Miscellaneous

bein.unique_filename_in(path=None)[source]

Return a random filename unique in the given path.

The filename returned is twenty alphanumeric characters which are not already serving as a filename in path. If path is omitted, it defaults to the current working directory.

class bein.ProgramOutput(return_code, pid, arguments, stdout, stderr)[source]

Object passed to return_value functions when binding programs.

Programs bound with @program can call a function when they are finished to create a return value from their output. The output is passed as a ProgramObject, containing all the information available to bein about that program.

bein.return_code

An integer giving the return code of the program when it exited. By convention this is 0 if the program succeeded, or some other value if it failed.

bein.pid

An integer giving the process ID under which the program ran.

bein.arguments

A list of strings which were the program and the exact arguments passed to it.

bein.stdout

The text printed by the program to stdout. It is returned as a list of strings, each corresponding to one line of stdout, and each still carrying their terminal \n.

bein.stderr

The text printed by the program to stderr. It has the same format as stdout.

exception bein.ProgramFailed(output)[source]

Thrown when a program bound by @program exits with a value other than 0.

bein.task(f)[source]

Wrap the function f in an execution.

The @task decorator wraps a function in an execution and handles producing a sensible return value. The function must expect its first argument to be an execution. The function produced by @task instead expects a MiniLIMS (or None) in its place. You can also pass a description keyword argument, which will be used to set the description of the execution. For example,:

@task
def f(ex, filename):
    touch(ex, filename)
    ex.add(filename, "New file")
    return {'created': filename}

will be wrapped into a function that is called as:

f(M, "boris", description="An execution")

where M is a MiniLIMS. It could also be called with None in place of M:

f(None, "boris")

which is the same as creating an execution without attaching it to a MiniLIMS. (In this case it will fail, since f tries to add a file to the MiniLIMS.)

The return value is a dictionary with three keys:

  • value is the value returned by the function which @task wrapped.
  • files is a dictionary of all files the execution added to the MiniLIMS, with their descriptions as keys and their IDs in the MiniLIMS as values.
  • execution is the execution ID.

In the call to f above, the return value would be (with some other value for 'execution' in practice):

{'value': {'created': 'boris'},
 'files': {'New file': 'boris'},
 'execution': 33}

Table Of Contents

Previous topic

Module: bbcflib.maplot

This Page

Websites