Team:Vanderbilt Software/Novel Approach

From 2014.igem.org

(Difference between revisions)

Latest revision as of 18:09, 18 January 2015

Home

Program

git, svn, and other version control systems focus on differences between lines. Since most DNA file formats split DNA to fixed-length lines, many lines are changed at once, for example, when inserting a single new line. darwin does away with that by producing a formatted file representing each ORF on its own line of text, making each edit only modify a single line of the output text.

Fig1. - darwin eliminates extra lines in the output file

Genes can be very long. To combat this, darwin will sample a section of every newly inserted ORF and compare it to nearby ORFs; if the new ORF is similar to another ORF, it is counted as “edited,” and darwin only records the character-by-character changes required to transform the old ORF into the new ORF.

Fig2. - darwin's unique method of parsing ORF

Finally, darwin uses concurrency to help speed up the process. File I/O is typically extremely slow, much slower than processing a file data already in memory. Splitting the processing concurrently helps to open up that speed bottleneck.

Fig3. - Representation of darwin‘s block processor increasing processing speed

@@ Line 61: / Line 61: @@
 <tr>
    <td width="45%"  valign="top">
-     boom
+<p>git, svn, and other version control systems focus on differences between lines. Since most
+DNA file formats split DNA to fixed-length lines, many lines are changed at once, for
+example, when inserting a single new line. darwin does away with that by producing a
+formatted file representing each ORF on its own line of text, making each edit only modify a
+single line of the output text.</p>
+     <figure>
+  <img src="https://static.igem.org/mediawiki/2014/f/f9/Editing_single_lines.png">
+  <figcaption>Fig1. - darwin eliminates extra lines in the output file</figcaption>
+</figure>
+<p>Genes can be very long. To combat this, darwin will sample a section of every newly inserted
+ORF and compare it to nearby ORFs; if the new ORF is similar to another ORF, it is counted
+as “edited,” and darwin only records the character-by-character changes required to
+transform the old ORF into the new ORF.</p>
+<figure>
+<img src="https://static.igem.org/mediawiki/2014/5/50/Editing_characters_in_lines.png">
+<figcaption>Fig2. - darwin's unique method of parsing ORF</figcaption>
+</figure>
+<p>Finally, darwin uses concurrency to help speed up the process. File I/O is typically extremely slow,
+much slower than processing a file data already in memory. Splitting the processing concurrently helps to open up that speed bottleneck.</p>
+<figure>
+<img src="https://static.igem.org/mediawiki/2014/b/bc/Pipeline_diagram_concurrency.png">
+<figcaption>Fig3. - Representation of darwin‘s block processor increasing processing speed</figcaption>
+</figure>
    </td>
 </tr>