Team:Vanderbilt Software/Project/Background

From 2014.igem.org

Revision as of 16:11, 26 January 2015 by Hwangas (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Background

Project Motivation

Programmers need to track even minute changes to their code exhaustively, especially as the project scales up. Without it, there'd be far too much time spent just on merging in changes from other programmers and far too little time spent actually coding. It's reasonable that an analogy exists in synthetic biology, where complicated structures are modified with the utmost of detail. But because most synthetic biologists work on relatively small logical functions in comparison to the hulking data structures programmers wield, introducing a version control system seemed like overkill. With today's exponentially-increasing complexity of synthetic life, though, it will soon be a grave mistake to not use some sort of change-tracking tool; biologists will soon get caught up in the same logical complexity that plagues professional programmers. Especially as synthetic biology grows as a field, with increasing support from government and enterprise, more people will get involved in these projects and an inability to effectively collaborate could hamper the entire industry.

However, no tool currently exists to support genome-scale changes in a useful way. Existing methods of tracking gene changes don't scale well, and don't incorporate the power of the version control tools programmers are used to. While programming version control tools could be used to cover DNA data, they aren't used to working with that sort of input and are extraordinarily inefficient. Darwin is a way around that. It's built off of version control tools, so it takes advantage of their strength. But it manipulates the DNA data to optimize for these tested and proven methods of version control, producing a much more efficient and secure tracking system. Darwin is completely open-source, so any security or optimization issues can be identified and solved immediately to help all users of the tool. It's also agnostic to the type of version control backend used, so it can be put into place on a massive variety of systems.

Coding

Programmers need to track even minute changes to their code exhaustively, especially as the project scales up. Typical tools used for this process include git and svn, which create "diffs" which capture differences between files and send them over a network to collaborators.

However, no tool currently exists to support genome-scale changes in a useful way. Programming version control tools can be used to cover DNA data, but they aren't used to working with that sort of input and are extraordinarily inefficient. darwin is a way around that. It uses version control tools as a backend, so it takes advantage of their strength. But it preprocesses the DNA data to optimize these tested and proven methods of version control, producing a much more efficient and secure tracking system built from the ground up to be able to scale to genome-size files. darwin is completely open-source, so any security or optimization issues can be identified and solved immediately to help all users of the tool. It's also agnostic to the type of version control backend used, so it can be put into place on a massive variety of systems.

Biology

The basis for the workings of any organism is rooted in its proteins as provided for in its genome. Every single protein can be traced back to an open reading frame, a set of base pairs defined by a start and stop sequence that evenly divides into codons (sets of 3 base pairs). Every codon corresponds to an amino acid and through the processes of transcription and translation, proteins are manufactured in the cell.

While DNA is a relatively stable molecule, different things can occur causing a mutation in the genetic code. There are several different types of mutations which are defined by how they affect protein translation downstream, specifically insertions, deletions, and substitutions. These can either be synonymous meaning that although there is a changes to base pair, the codon still codes for the same amino acid so there is no detectable change, or nonsynonymous which may affect the makeup of the protein. Lastly, there are frameshift mutations which generate an entirely different open reading frame.