Team:Vanderbilt Software/Project/User Guide

From 2014.igem.org

(Difference between revisions)
(Created page with "{{Team:Vanderbilt_Software/CSS}} <html> <head lang="en"> <meta charset="UTF-8"> <title>darwin</title> <style> .tab_left { visibility: visible; ...")
Line 82: Line 82:
         <p>Let's break this down into multiple cases:</p>
         <p>Let's break this down into multiple cases:</p>
         <ol>
         <ol>
-
             <li>Given a genomic file, convert it to a version-controllable format, which we call <i>vcsfmt</i></li>
+
             <li>Given a genomic file, convert it to a version-controllable format, which we call <i>vcsfmt</i>.</li>
             <li>Given two <i>vcsfmt</i>, find the difference betweem them and produce a file to be put under version control, which we call <i>vcscmp</i></li>
             <li>Given two <i>vcsfmt</i>, find the difference betweem them and produce a file to be put under version control, which we call <i>vcscmp</i></li>
             <li>Reproduce the original input file from a <i>vcsfmt</i> or a <i>vcscmp</i> file.</li>
             <li>Reproduce the original input file from a <i>vcsfmt</i> or a <i>vcscmp</i> file.</li>
Line 89: Line 89:
     <div id="right_page" class="page">
     <div id="right_page" class="page">
 +
        <header>1. Recover original file from <i>vcsfmt</i> or <i>vcscmp</i> file</header>
 +
        <p>Run dwndiff as follows:</p>
 +
        <pre>dwndiff &lt;file1&gt; &lt;file2&gt; -c [-w -p]</pre>
 +
        <p>As before, this takes in files and produces an output, which can be written to file or stdout. If written to file, it will have the ".vcscmp" suffix. The difference between this and the previous command is that dwndiff can only have two input files for this mode of operation, and the input files must already have been preformatted by dwndiff. The resulting output, whether sent to stdout or to the specified file, will be the difference between the two, which, when reviewed by a version control system such as git or svn, will appear extremely small.</p>
 +
 +
<header>2. Take two vcsfmt files and produce the vcscmp "diff"</header>
 +
        <p>Run dwndiff as follows:</p>
 +
    <pre>dwndiff &lt;file1&gt; &lt;file2&gt; -c [-w -p]</pre>
 +
    <p>As before, this takes in files and produces an output, which can be written to file or stdout. If written to file, it will have the ".vcscmp" suffix. The difference between this and the previous command is that dwndiff can only have two input files for this mode of operation, and the input files must already have been preformatted by dwndiff. The resulting output, whether sent to stdout or to the specified file, will be the difference between the two, which, when reviewed by a version control system such as git or svn, will appear extremely small.</p>
 +
 +
<header>3. Recover original file from vcsfmt or vcscmp file</header>
 +
    <p>Run dwndiff as follows:</p>
 +
    <pre>dwndiff &lt;file&gt; -u</pre>
 +
    <p>This faithfully reconstructs the original genomic file from the specially-formatted version dwndiff had previously reproduced. The file is produced in the same directory as the input file, with the ".vcsfmt" or ".vcscmp" suffix removed, as appropriate.</p>
     </div>
     </div>
</div>
</div>
Line 95: Line 109:
<div id="right_button" class="button"></div>
<div id="right_button" class="button"></div>
-
<script type="text/javascript" src="https://2014.igem.org/Team:Vanderbilt/subpagebuilder?action=raw&ctype=text/javascript"></script>
 
-
 
-
<script>
 
-
    var builder = new SubPageBuilder();
 
-
    var img1 = builder.createPhoto("https://static.igem.org/mediawiki/2014/b/be/Diff_speed.png", "Speed of file processing in darwin vs. vanilla_git", 480, 480, "Speed of file processing in darwin vs. vanilla_git");
 
-
    var img2 = builder.createPhoto("https://static.igem.org/mediawiki/2014/5/53/Diff_lines_changed.png", "Line changes in darwin vs. vanilla_git", 480, 480, "Line changes in darwin vs. vanilla_git");
 
-
    document.getElementById("right_page").appendChild(img1);
 
-
    document.getElementById("right_page").appendChild(img2);
 
-
</script>
 
</body>
</body>
</html>
</html>

Revision as of 17:13, 26 January 2015

darwin

User Guide

It's important to understand the difference between the software darwin and the utility dwndiff. Over the summer, the team produced dwndiff, which is a command-line utility for producing what are called "diff" files suitable for use with genomic data. darwin, on the other hand, is the overall vision of the software team: to produce a solution for tracking changes to all sorts of genomic data. Given input files in FASTA, GenBank, or ApE format, the dwndiff utility can speedily produce output files which can be used with an external version control system much more easily. Examples of such version control systems include git, svn, and hg.

Using dwndiff is somewhat complex at first, but let's start with the most obvious solution: the "help" command:

$ dwndiff --help
      Usage: dwndiff [OPTION...] FILE...
      dwndiff is a command-line tool to format DNA sequences for use in version
      control systems. It is intended to be used as a backend for version control
      tools which act upon biological data. It defaults to 'format' mode, where it
      formats a given DNA sequence (through stdin, or the given files) to a special
      format, appending the '.vcsfmt' suffix. If given the '-c' option, however, it
      will take two files, convert them to .vcsfmt format if necessary, and produce a
      .vcscmp file which can be used in version control software.

      -p, --preformat_loc=DIR    Location of pre-existing .vcsfmt files for given
      input DNA
      -w, --write                Write output to file(s) instead of stdout
      -c, --compare              If two files given, produce unix diff-compatible
      comparison .vcscmp file
      -v, --verbose              Produce verbose output
      -?, --help                 Give this help list
      --usage                Give a short usage message
      -V, --version              Print program version

      Mandatory or optional arguments to long options are also mandatory or optional
      for any corresponding short options.
    

Let's break this down into multiple cases:

  1. Given a genomic file, convert it to a version-controllable format, which we call vcsfmt.
  2. Given two vcsfmt, find the difference betweem them and produce a file to be put under version control, which we call vcscmp
  3. Reproduce the original input file from a vcsfmt or a vcscmp file.
1. Recover original file from vcsfmt or vcscmp file

Run dwndiff as follows:

dwndiff <file1> <file2> -c [-w -p]

As before, this takes in files and produces an output, which can be written to file or stdout. If written to file, it will have the ".vcscmp" suffix. The difference between this and the previous command is that dwndiff can only have two input files for this mode of operation, and the input files must already have been preformatted by dwndiff. The resulting output, whether sent to stdout or to the specified file, will be the difference between the two, which, when reviewed by a version control system such as git or svn, will appear extremely small.

2. Take two vcsfmt files and produce the vcscmp "diff"

Run dwndiff as follows:

dwndiff <file1> <file2> -c [-w -p]

As before, this takes in files and produces an output, which can be written to file or stdout. If written to file, it will have the ".vcscmp" suffix. The difference between this and the previous command is that dwndiff can only have two input files for this mode of operation, and the input files must already have been preformatted by dwndiff. The resulting output, whether sent to stdout or to the specified file, will be the difference between the two, which, when reviewed by a version control system such as git or svn, will appear extremely small.

3. Recover original file from vcsfmt or vcscmp file

Run dwndiff as follows:

dwndiff <file> -u

This faithfully reconstructs the original genomic file from the specially-formatted version dwndiff had previously reproduced. The file is produced in the same directory as the input file, with the ".vcsfmt" or ".vcscmp" suffix removed, as appropriate.