Team:Vanderbilt Software/Project/darwin


(Difference between revisions)
(Created page with "{{Team:Vanderbilt_Software/CSS}} <html> <head lang="en"> <meta charset="UTF-8"> <title>darwin</title> <style> .tab_left { visibility: hidden; ...")
Line 67: Line 67:
     <div id="right_page" class="page">
     <div id="right_page" class="page">
     <img src="" class="page_img">
     <img src="" class="page_img" width=487 height=322>

Revision as of 16:21, 26 January 2015


Program Description

Version control systems sych as git and svn focus on differences between lines. Since most DNA file formats split DNA to fixed-length lines, many lines are changed at once, for example, when inserting a single new line. darwin does away with that by producing a formatted file representing each ORF on its own line of text, making each edit only modify a single line of the output text.

Genes can be very long. To combat this, darwin will sample a section of every newly inserted ORF and compare it to nearby ORFs; if the new ORF is similar to another ORF, it is counted as “edited,” and darwin only records the character-by-character changes required to transform the old ORF into the new ORF.

Finally, darwin uses concurrency to help speed up the process. File I/O is typically extremely slow, much slower than processing a file data already in memory. Splitting the processing concurrently helps to open up that speed bottleneck.