Team:Vanderbilt Software/Project/Results

From 2014.igem.org

Revision as of 16:52, 26 January 2015 by Hwangas (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

darwin

Results

We produced software which implemented the algorithms described in our novel approach on specially cleaned input files and ran them through a typical version control system. We performed experimentation on transformations between iterations 3-6 of our wetware team's yeast plasmids. As a control, we compared darwin's output running through the git version control system to simple processing with git (vanilla_git). We expected darwin to run faster and show fewer changed lines in the diff output than vanilla_git.

As expected, darwin's preprocessing produced a significantly faster diff, with an average speed of 5.763 seconds per file, as opposed to 10.640 seconds for the vanilla_git. However, against our assumptions, that performance did not seem to rely upon the reduction of changed lines reported by git, since darwin actually produced more changed lines. More research is required to find the correlation between the number of changed lines reported by git and the number of lines actually changed in the file.

Future Work

Future work mostly revolves around bringing darwin from a standalone diff-producing executable to an integrated system with a GUI usable by novices. The major challenge with that is parsing the plethora of different file types used to represent DNA although the implementation of other formats such as SBOL, ApE, etc., could be done easily by extracting the actual genetic sequence from the file.

In addition, the current algorithms are unable to deal with introns of any sort in the input DNA sequence, and will attempt to split the file into ORFs regardless. The ability to correctly identify non-coding regions like this would create a far more valuable piece of software, able to deal with prokaryotes and eukaryotes alike.