Team:Vanderbilt Software/Project/Home


Revision as of 08:21, 26 January 2015 by Hwangas (Talk | contribs)


Program Description

Darwin is a software package to document changes to DNA which allows for easy, standardized, and collaborative editing on DNA data up to the genome scale. It builds off of tested and proven version control software, so the entire history of the tracked data is easy to browse and transfer. But Darwin is specifically focused on DNA data, speeding up the tracking process and offering significant security improvements from any current system.

Because Darwin uses existing version control systems, the majority of the heavy lifting is complete already; Darwin can be installed right on top of the existing software. Darwin's contribution is to parse and format the biological data so that it can be used more effectively with these systems. It uses a variety of heuristics to effectively split the data and granularize the changes made to produce change logging orders of magnitude more time- and space-efficient than any other method.

Current Status

The program produced by Vanderbilt's 2014 iGEM software team is named darwin, a Unix command-line tool to produce what are known as "diffs" between two genomic data files. The command to produce the diff is dwndiff, typically invoked by using:

dwndiff <file1> <file2>

This creates a file with the extension .vcscmp which can be used in typical version control systems.

The project in its current status can be found in its repository on github. Documentation on the produced software tool can be found on the Program page