Team:StanfordBrownSpelman/Modelling
From 2014.igem.org
(Difference between revisions)
Line 80: | Line 80: | ||
<h6> | <h6> | ||
Gene synthesis as a tool for biological engineering presents both opportunities and challenges. One opportunity presented is the ability to optimize codon usage in a gene to match that of a host organism. Compared to traditional cloning methods, this can increase protein yields in the host organism by several fold. However, while there exist a large number of freely-usable programs that perform codon optimization, there is no guarantee that the sequences these programs provide will be able to be synthesized. Specifically, in the case of genes with repetitive amino acid sequences, these programs will often generate outputs that contain too many repeated short DNA sequences to be synthesized commercially. | Gene synthesis as a tool for biological engineering presents both opportunities and challenges. One opportunity presented is the ability to optimize codon usage in a gene to match that of a host organism. Compared to traditional cloning methods, this can increase protein yields in the host organism by several fold. However, while there exist a large number of freely-usable programs that perform codon optimization, there is no guarantee that the sequences these programs provide will be able to be synthesized. Specifically, in the case of genes with repetitive amino acid sequences, these programs will often generate outputs that contain too many repeated short DNA sequences to be synthesized commercially. | ||
- | |||
</h6> | </h6> | ||
Line 106: | Line 105: | ||
<div class="row"> | <div class="row"> | ||
<div id="subheader" class="small-8 small-centered columns"> | <div id="subheader" class="small-8 small-centered columns"> | ||
- | <h6> Note that this sequence is not simply the same sequence repeated multiple times, but instead contains several motifs on the order of 10 - 20 amino acids in length that occur several times. When this sequence was run through the codon optimization program for expression in ''E. coli'' provided by a major DNA synthesis firm, the resulting output could not be synthesized by the very same firm: the "optimized" DNA sequence contained too many recurring short (> 8 nucleotide) DNA sequences to allow for synthesis. | + | <h6> Note that this sequence is not simply the same sequence repeated multiple times, but instead contains several motifs on the order of 10 - 20 amino acids in length that occur several times. When this sequence was run through the codon optimization program for expression in ''E. coli'' provided by a major DNA synthesis firm, the resulting output could not be synthesized by the very same firm: the "optimized" DNA sequence contained too many recurring short (> 8 nucleotide) DNA sequences to allow for synthesis.<br></br> |
- | + | Manually correcting for repeats in the codon-optimized DNA sequence is a sub-optimal solution: not only is this process time-consuming, but it has the tendency to undo the codon-optimization: if a sequence of amino acids occurs several times, one may be forced to use all possible codon-combinations to represent this sequence to avoid nucleotide-sequence repetition. Unless corrected for by skewing codon usage elsewhere in the sequence, this will tend to make the codon usage more uniform than is optimal for the expression vector. Additionally, any changes made in either correcting for repeats or re-correcting for codon usage may in turn introduce additional repeats. <br></br> | |
- | + | ||
- | Manually correcting for repeats in the codon-optimized DNA sequence is a sub-optimal solution: not only is this process time-consuming, but it has the tendency to undo the codon-optimization: if a sequence of amino acids occurs several times, one may be forced to use all possible codon-combinations to represent this sequence to avoid nucleotide-sequence repetition. Unless corrected for by skewing codon usage elsewhere in the sequence, this will tend to make the codon usage more uniform than is optimal for the expression vector. Additionally, any changes made in either correcting for repeats or re-correcting for codon usage may in turn introduce additional repeats. | + | |
- | + | ||
- | <br></br> | + | |
</h6> | </h6> | ||
</div> | </div> | ||
Line 134: | Line 129: | ||
<div id="subheader" class="small-8 small-centered columns"> | <div id="subheader" class="small-8 small-centered columns"> | ||
<h5><center>Availability and Usage</h5> | <h5><center>Availability and Usage</h5> | ||
- | <h6>DoubleOptimizer may be downloaded <a href = " | + | <h6>DoubleOptimizer may be downloaded <a href = "http://drive.google.com/a/brown.edu/file/d/0B6Q5Eo65G4cPZC1SZWEzbUtrYUU/view?usp=sharing"> here </a> <br></br> |
- | DoubleOptimizer is a command line utility, provided as a Java jar file. It can be invoked from command line on any system with Java installed, with the following syntax: | + | DoubleOptimizer is a command line utility, provided as a Java jar file. It can be invoked from command line on any system with Java installed, with the following syntax: |
- | java -jar DoubleOptimizer.jar seq.txt codons.txt [Optional flags] | + | <div class="sub6">java -jar DoubleOptimizer.jar seq.txt codons.txt [Optional flags] </div> |
where "seq.txt" is a DNA sequence, stored as a plain text file, and "codons.txt" is a file containing the desired codon distribution to match. It should be formatted as plain text, according to the following example template: <br></br> | where "seq.txt" is a DNA sequence, stored as a plain text file, and "codons.txt" is a file containing the desired codon distribution to match. It should be formatted as plain text, according to the following example template: <br></br> | ||
- | GCG .36 | + | <div class="sub6">GCG .36 |
GCC .27 | GCC .27 | ||
GCA .21 | GCA .21 | ||
Line 226: | Line 221: | ||
GTC .22 | GTC .22 | ||
GTA .15 | GTA .15 | ||
- | + | </div> | |
- | + | ||
(Note that the above example is actually the codon usage distribution of <i>E. coli</i>.) <br></br> | (Note that the above example is actually the codon usage distribution of <i>E. coli</i>.) <br></br> | ||
Line 239: | Line 233: | ||
The following optional flags may be used to change the program's behavior:<br></br> | The following optional flags may be used to change the program's behavior:<br></br> | ||
- | * | + | <div class="sub6"> *-A <br></br> </div> |
This allows for an amino-acid sequence, specified in single-letter code, to be used as input instead of a DNA sequence. The initial sequence statistics displayed will be for a uniform random reverse translation of the given amino acid sequence. <br></br> | This allows for an amino-acid sequence, specified in single-letter code, to be used as input instead of a DNA sequence. The initial sequence statistics displayed will be for a uniform random reverse translation of the given amino acid sequence. <br></br> |
Revision as of 18:59, 10 October 2014
Double Optimizer
Gene synthesis as a tool for biological engineering presents both opportunities and challenges. One opportunity presented is the ability to optimize codon usage in a gene to match that of a host organism. Compared to traditional cloning methods, this can increase protein yields in the host organism by several fold. However, while there exist a large number of freely-usable programs that perform codon optimization, there is no guarantee that the sequences these programs provide will be able to be synthesized. Specifically, in the case of genes with repetitive amino acid sequences, these programs will often generate outputs that contain too many repeated short DNA sequences to be synthesized commercially.
As an example, the hypothetical protein X777_06170 from the ant species Cerapachys biroi has an amino acid sequence that appears to be somewhat repetitive:
001 mklfkclvpv vvlllikdss arpglirdfv ggtvgsilep fqilkpkdsy adanshasah
061 nlggtfslgp vslggglssa sasssasang gglasasska daqaggygyg gsnanaqasa
121 sanaqgggyg nggihgiypg qqgvhggnpf lggagsnana naiananaqa naggnngglg
181 syggyqqggn ypidsstgpi gnnpflsggh gdgnanaaan anagasaign gggpidvnnp
241 flhggaansg agginyqpgn aggiilsekp lglptiypgq hppayldsig spgansnaga
301 napcsecgss gatilgyegq glggikesgs sgatilgyeg qglggikesg ssgatilgye
361 gqglggikes gssgatilgs ydgqgpsgat ilgdyngqgl ggikessgvt vlgdyegqgl
421 ggisgphggh gqaganagan ananagatvg ssggvlggvg dhggyhgyng hdgssglnlg
481 gygggsnana qassnalass ggsssatsda lsnahssggs alanssskas angsgsanan
541 ahassnassg shglgsktsa ssqasasadt rdmlifs
061 nlggtfslgp vslggglssa sasssasang gglasasska daqaggygyg gsnanaqasa
121 sanaqgggyg nggihgiypg qqgvhggnpf lggagsnana naiananaqa naggnngglg
181 syggyqqggn ypidsstgpi gnnpflsggh gdgnanaaan anagasaign gggpidvnnp
241 flhggaansg agginyqpgn aggiilsekp lglptiypgq hppayldsig spgansnaga
301 napcsecgss gatilgyegq glggikesgs sgatilgyeg qglggikesg ssgatilgye
361 gqglggikes gssgatilgs ydgqgpsgat ilgdyngqgl ggikessgvt vlgdyegqgl
421 ggisgphggh gqaganagan ananagatvg ssggvlggvg dhggyhgyng hdgssglnlg
481 gygggsnana qassnalass ggsssatsda lsnahssggs alanssskas angsgsanan
541 ahassnassg shglgsktsa ssqasasadt rdmlifs