Team:Vanderbilt Software/Background

From 2014.igem.org

(Difference between revisions)
(initial edit)
 
Line 59: Line 59:
<!--Project content  -->
<!--Project content  -->
-
<tr><td > <h3> Program Description </h3></td>
 
-
  <td><h3>User Guide and API Reference</h3></td>
 
-
</tr>
 
-
 
<tr>
<tr>
   <td width="45%"  valign="top">
   <td width="45%"  valign="top">

Latest revision as of 17:35, 18 January 2015


Home Program

Coding

Programmers need to track even minute changes to their code exhaustively, especially as the project scales up. Typical tools used for this process include git and svn, which create "diffs" which capture differences between files and send them over a network to collaborators.

However, no tool currently exists to support genome-scale changes in a useful way. Programming version control tools can be used to cover DNA data, but they aren't used to working with that sort of input and are extraordinarily inefficient. darwin is a way around that. It uses version control tools as a backend, so it takes advantage of their strength. But it preprocesses the DNA data to optimize these tested and proven methods of version control, producing a much more efficient and secure tracking system built from the ground up to be able to scale to genome-size files. darwin is completely open-source, so any security or optimization issues can be identified and solved immediately to help all users of the tool. It's also agnostic to the type of version control backend used, so it can be put into place on a massive variety of systems.

Biology

The basis for the workings of any organism is rooted in its proteins as provided for in its genome. Every single protein can be traced back to an open reading frame, a set of base pairs defined by a start and stop sequence that evenly divides into codons (sets of 3 base pairs). Every codon corresponds to an amino acid and through the processes of transcription and translation, proteins are manufactured in the cell.

While DNA is a relatively stable molecule, different things can occur causing a mutation in the genetic code. There are several different types of mutations which are defined by how they affect protein translation downstream, specifically insertions, deletions, and substitutions. These can either be synonymous meaning that although there is a changes to base pair, the codon still codes for the same amino acid so there is no detectable change, or nonsynonymous which may affect the makeup of the protein. Lastly, there are frameshift mutations which generate an entirely different open reading frame.