Team:Vanderbilt Software/Project/Results

From 2014.igem.org

(Difference between revisions)
 
(One intermediate revision not shown)
Line 5: Line 5:
<head lang="en">
<head lang="en">
     <meta charset="UTF-8">
     <meta charset="UTF-8">
-
     <title>Home</title>
+
     <title>darwin</title>
     <style>
     <style>
         .tab_left {
         .tab_left {
-
             visibility: hidden;
+
             visibility: visible;
         }
         }
         .tab_right {
         .tab_right {
-
             visibility: visible;
+
             visibility: hidden;
         }
         }
         #home_tab_left {
         #home_tab_left {
Line 19: Line 19:
         #home_tab_right {
         #home_tab_right {
             visibility: hidden;
             visibility: hidden;
 +
        }
 +
        #user_guide_tab_left {
 +
            visibility: hidden;
 +
        }
 +
        #user_guide_tab_right {
 +
            visibility: visible;
 +
        }
 +
        #api_reference_tab_left {
 +
            visibility: hidden;
 +
        }
 +
        #api_reference_tab_right {
 +
            visibility: visible;
         }
         }
     </style>
     </style>
Line 48: Line 60:
<div id="openLabBook">
<div id="openLabBook">
     <div id="left_page" class="page">
     <div id="left_page" class="page">
-
         <header>Program Description</header>
+
         <header>Results</header>
-
         <p>Darwin is a software package to document changes to DNA which allows for easy, standardized, and collaborative editing on DNA data up to the genome scale. It builds off of tested and proven version control software, so the entire history of the tracked data is easy to browse and transfer. But Darwin is specifically focused on DNA data, speeding up the tracking process and offering significant security improvements from any current system.</p>
+
         <p>We produced software which implemented the algorithms described in our novel approach on specially cleaned input files and ran them through a typical version control system. We performed experimentation on transformations between iterations 3-6 of our wetware team's yeast plasmids. As a control, we compared darwin's output running through the git version control system to simple processing with git (vanilla_git). We expected darwin to run faster and show fewer changed lines in the diff output than vanilla_git.</p>
-
         <p>Because Darwin uses existing version control systems, the majority of the heavy lifting is complete already; Darwin can be installed right on top of the existing software. Darwin's contribution is to parse and format the biological data so that it can be used more effectively with these systems. It uses a variety of heuristics to effectively split the data and granularize the changes made to produce change logging orders of magnitude more time- and space-efficient than any other method.</p>
+
        <p>As expected, darwin's preprocessing produced a significantly faster diff, with an average speed of 5.763 seconds per file, as opposed to 10.640 seconds for the vanilla_git. However, against our assumptions, that performance did not seem to rely upon the reduction of changed lines reported by git, since darwin actually produced more changed lines. More research is required to find the correlation between the number of changed lines reported by git and the number of lines actually changed in the file.</p>
-
 
+
        <header>Future Work</header>
 +
        <p>Future work mostly revolves around bringing darwin from a standalone diff-producing executable to an integrated system with a GUI usable by novices. The major challenge with that is parsing the plethora of different file types used to represent DNA although the implementation of other formats such as SBOL, ApE, etc., could be done easily by extracting the actual genetic sequence from the file.</p>
 +
         <p>In addition, the current algorithms are unable to deal with introns of any sort in the input DNA sequence, and will attempt to split the file into ORFs regardless. The ability to correctly identify non-coding regions like this would create a far more valuable piece of software, able to deal with prokaryotes and eukaryotes alike.</p>
     </div>
     </div>
 +
     <div id="right_page" class="page">
     <div id="right_page" class="page">
-
        <header>Current Status</header>
 
-
        <p>The project in its current status can be found in its repository on <a href="https://github.com/igemsoftware/Vanderbilt_2014">github</a>. Documentation on the produced software tool can be found on the Program page</p>
 
     </div>
     </div>
</div>
</div>
Line 62: Line 75:
<div id="right_button" class="button"></div>
<div id="right_button" class="button"></div>
 +
<script type="text/javascript" src="https://2014.igem.org/Team:Vanderbilt/subpagebuilder?action=raw&ctype=text/javascript"></script>
 +
 +
<script>
 +
    var builder = new SubPageBuilder();
 +
    var img1 = builder.createPhoto("https://static.igem.org/mediawiki/2014/b/be/Diff_speed.png", "Speed of file processing in darwin vs. vanilla_git", 480, 480, "Speed of file processing in darwin vs. vanilla_git");
 +
    var img2 = builder.createPhoto("https://static.igem.org/mediawiki/2014/5/53/Diff_lines_changed.png", "Line changes in darwin vs. vanilla_git", 480, 480, "Line changes in darwin vs. vanilla_git");
 +
    document.getElementById("right_page").appendChild(img1);
 +
    document.getElementById("right_page").appendChild(img2);
 +
</script>
</body>
</body>
</html>
</html>

Latest revision as of 16:52, 26 January 2015

darwin

Results

We produced software which implemented the algorithms described in our novel approach on specially cleaned input files and ran them through a typical version control system. We performed experimentation on transformations between iterations 3-6 of our wetware team's yeast plasmids. As a control, we compared darwin's output running through the git version control system to simple processing with git (vanilla_git). We expected darwin to run faster and show fewer changed lines in the diff output than vanilla_git.

As expected, darwin's preprocessing produced a significantly faster diff, with an average speed of 5.763 seconds per file, as opposed to 10.640 seconds for the vanilla_git. However, against our assumptions, that performance did not seem to rely upon the reduction of changed lines reported by git, since darwin actually produced more changed lines. More research is required to find the correlation between the number of changed lines reported by git and the number of lines actually changed in the file.

Future Work

Future work mostly revolves around bringing darwin from a standalone diff-producing executable to an integrated system with a GUI usable by novices. The major challenge with that is parsing the plethora of different file types used to represent DNA although the implementation of other formats such as SBOL, ApE, etc., could be done easily by extracting the actual genetic sequence from the file.

In addition, the current algorithms are unable to deal with introns of any sort in the input DNA sequence, and will attempt to split the file into ORFs regardless. The ability to correctly identify non-coding regions like this would create a far more valuable piece of software, able to deal with prokaryotes and eukaryotes alike.