Team:Vanderbilt Software/Project/darwin

From 2014.igem.org

(Difference between revisions)
Line 78: Line 78:
<script>
<script>
     builder = new SubPageBuilder();
     builder = new SubPageBuilder();
-
     var right_page = document.getElementById("right_page");
+
     document.getElementById("right_page").appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/f/f9/Editing_single_lines.png", "darwin eliminates extra lines in the output file", 487, 322. "darwin eliminates extra lines in the output file"));
-
    right_page.appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/f/f9/Editing_single_lines.png", "darwin eliminates extra lines in the output file", 487, 322. "darwin eliminates extra lines in the output file"));
+
     document.getElementById("right_page").appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/5/50/Editing_characters_in_lines.png", "darwin's unique method of parsing ORF", 487, 381darwin's unique method of parsing ORF"));
-
     right_page.appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/5/50/Editing_characters_in_lines.png", "darwin's unique method of parsing ORF", 487, 381darwin's unique method of parsing ORF"));
+
     document.getElementById("right_page").appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/b/bc/Pipeline_diagram_concurrency.png", "Representation of darwin‘s block processor increasing processing speed", 487, 381, Representation of darwin‘s block processor increasing processing speed"));
-
     right_page.appendChild(builder.createPhoto("http://2014.igem.org/wiki/images/b/bc/Pipeline_diagram_concurrency.png", "Representation of darwin‘s block processor increasing processing speed", 487, 381, Representation of darwin‘s block processor increasing processing speed"));
+
</script>
</script>
</html>
</html>

Revision as of 16:32, 26 January 2015

darwin

Program Description

Version control systems sych as git and svn focus on differences between lines. Since most DNA file formats split DNA to fixed-length lines, many lines are changed at once, for example, when inserting a single new line. darwin does away with that by producing a formatted file representing each ORF on its own line of text, making each edit only modify a single line of the output text.

Genes can be very long. To combat this, darwin will sample a section of every newly inserted ORF and compare it to nearby ORFs; if the new ORF is similar to another ORF, it is counted as “edited,” and darwin only records the character-by-character changes required to transform the old ORF into the new ORF.

Finally, darwin uses concurrency to help speed up the process. File I/O is typically extremely slow, much slower than processing a file data already in memory. Splitting the processing concurrently helps to open up that speed bottleneck.