Team:Vanderbilt Software/User Guide

From 2014.igem.org

Revision as of 02:10, 18 October 2014 by Cosmicexplorer (Talk | contribs)


Home Project Program Team Attributions

User Guide

It's important to understand the difference between the software darwin and the utility dwndiff. Over the summer, the team produced dwndiff, which is a command-line utility for producing what are called "diff" files suitable for use with genomic data. darwin, on the other hand, is the overall vision of the software team: to produce a solution for tracking changes to all sorts of genomic data. Given input files, the dwndiff utility can speedily produce output files which can be used with an external version control system much more easily. Examples of such version control systems include git, svn, and hg.

Using dwndiff is somewhat complex at first, but let's start with the most obvious solution: the "help" command:

$ ./dwndiff --help
Usage: dwndiff [OPTION...] FILE...
dwndiff is a command-line tool to format DNA sequences for use in version
control systems. It is intended to be used as a backend for version control
tools which act upon biological data. It defaults to 'format' mode, where it
formats a given DNA sequence (through stdin, or the given files) to a special
format, appending the '.vcsfmt' suffix. If given the '-c' option, however, it
will take two files, convert them to .vcsfmt format if necessary, and produce a
.vcscmp file which can be used in version control software.

  -p, --preformat_loc=DIR    Location of pre-existing .vcsfmt files for given
                             input DNA
  -w, --write                Write output to file(s) instead of stdout
  -c, --compare              If two files given, produce unix diff-compatible
                             comparison .vcscmp file
  -v, --verbose              Produce verbose output
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
    

Let's break this down into multiple cases:

  1. Given a genomic file, convert it to a version-controllable format, which we call vcsfmt.
  2. Given two vcsfmt files, find the difference between them and produce a file to be put under version control, which we call vcscmp.