Team:Vanderbilt Software/User Guide
From 2014.igem.org
|
||
User Guide |
||
It's important to understand the difference between the software darwin and the utility dwndiff. Over the summer, the team produced dwndiff, which is a command-line utility for producing what are called "diff" files suitable for use with genomic data. darwin, on the other hand, is the overall vision of the software team: to produce a solution for tracking changes to all sorts of genomic data. Given input files in FASTA, GenBank, or ApE format, the dwndiff utility can speedily produce output files which can be used with an external version control system much more easily. Examples of such version control systems include git, svn, and hg. Using dwndiff is somewhat complex at first, but let's start with the most obvious solution: the "help" command: $ dwndiff --help Usage: dwndiff [OPTION...] FILE... dwndiff is a command-line tool to format DNA sequences for use in version control systems. It is intended to be used as a backend for version control tools which act upon biological data. It defaults to 'format' mode, where it formats a given DNA sequence (through stdin, or the given files) to a special format, appending the '.vcsfmt' suffix. If given the '-c' option, however, it will take two files, convert them to .vcsfmt format if necessary, and produce a .vcscmp file which can be used in version control software. -p, --preformat_loc=DIR Location of pre-existing .vcsfmt files for given input DNA -w, --write Write output to file(s) instead of stdout -c, --compare If two files given, produce unix diff-compatible comparison .vcscmp file -v, --verbose Produce verbose output -?, --help Give this help list --usage Give a short usage message -V, --version Print program version Mandatory or optional arguments to long options are also mandatory or optional for any corresponding short options. Let's break this down into multiple cases:
Convert genomic file to version-controllable format (vcsfmt)Run dwndiff as follows: dwndiff <file> [-w] This produces a specially-formatted file which preserves the information of the original genomic file: the input file can be completely restructured from the output. The important point about the specially-formatted file is that it applies a variety of optimizations to run more quickly under most normal version control systems. If the '-w' option is specified, then dwndiff writes out the output file to a file named the same as the original, but with ".vcsfmt" appended to the end of the filename. Otherwise, it will write the output to stdout. If '-w' is given as an option, then multiple files can be given as input, and dwndiff will produce vcsfmt versions of all of them. Take two vcsfmt files and produce vcscmp "diff"Run dwndiff as follows: dwndiff <file1> <file2> -c [-w -p] As before, this takes in files and produces an output, which can be written to file or stdout. If written to file, it will have the ".vcscmp" suffix. The difference between this and the previous command is that dwndiff can only have two input files for this mode of operation, and the input files must already have been preformatted by dwndiff. The resulting output, whether sent to stdout or to the specified file, will be the difference between the two, which, when reviewed by a version control system such as git or svn, will appear extremely small. Recover original file from vcsfmt or vcscmp fileRun dwndiff as follows: dwndiff <file> -u This faithfully reconstructs the original genomic file from the specially-formatted version dwndiff had previously reproduced. The file is produced in the same directory as the input file, with the ".vcsfmt" or ".vcscmp" suffix removed, as appropriate. Using this in a Version Control SystemThe dwndiff utility still leaves a lot of work for the user to do. The user must scan in the file to track to produce a vcsfmt file, then track that using an external version control system (many suuch options are noted above). The user must then perform the same operation on that file whenever it is changed to produce a vcsfmt file, and then run dwndiff again to produce a .vcscmp file which will be tracked, and so on for each successive iteration. When another user wishes to use the changed file, they must obtain the vcscmp file from the version control system, then run dwndiff with the "unzip" option to obtain the original file. Future versions will automate this nonsense and reduce the hassle. |