Team:StanfordBrownSpelman/Modelling
From 2014.igem.org
(Difference between revisions)
Line 75: | Line 75: | ||
<div id="header" class="small-8 small-centered columns"> | <div id="header" class="small-8 small-centered columns"> | ||
<h3><center><a href="https://2014.igem.org/Team:StanfordBrownSpelman/Modelling">Double Optimizer</a></h3> | <h3><center><a href="https://2014.igem.org/Team:StanfordBrownSpelman/Modelling">Double Optimizer</a></h3> | ||
- | <h7><center>A utility for simultaneous codon and gene synthesis optimization</h7> | + | <h7><center>A utility for simultaneous codon and gene synthesis optimization <br></br> </h7> |
+ | |||
<h6> | <h6> | ||
Gene synthesis as a tool for biological engineering presents both opportunities and challenges. One opportunity presented is the ability to optimize codon usage in a gene to match that of a host organism. Compared to traditional cloning methods, this can increase protein yields in the host organism by several fold. However, while there exist a large number of freely-usable programs that perform codon optimization, there is no guarantee that the sequences these programs provide will be able to be synthesized. Specifically, in the case of genes with repetitive amino acid sequences, these programs will often generate outputs that contain too many repeated short DNA sequences to be synthesized commercially. | Gene synthesis as a tool for biological engineering presents both opportunities and challenges. One opportunity presented is the ability to optimize codon usage in a gene to match that of a host organism. Compared to traditional cloning methods, this can increase protein yields in the host organism by several fold. However, while there exist a large number of freely-usable programs that perform codon optimization, there is no guarantee that the sequences these programs provide will be able to be synthesized. Specifically, in the case of genes with repetitive amino acid sequences, these programs will often generate outputs that contain too many repeated short DNA sequences to be synthesized commercially. | ||
- | |||
- | |||
<br></br> | <br></br> | ||
- | + | </h6> | |
- | + | </div> | |
- | + | </div> | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | </div> | + | |
- | Note that this sequence is not simply the same sequence repeated multiple times, but instead contains several motifs on the order of 10 - 20 amino acids in length that occur several times. When this sequence was run through the codon optimization program for expression in ''E. coli'' provided by a major DNA synthesis firm, the resulting output could not be synthesized by the very same firm: the "optimized" DNA sequence contained too many recurring short (> 8 nucleotide) DNA sequences to allow for synthesis. | + | <div class="row"> |
+ | <div id="subheader" class="small-8 small-centered columns"> | ||
+ | <h6> As an example, the hypothetical protein X777_06170 from the ant species <i>Cerapachys biroi</i> has an amino acid sequence that appears to be somewhat repetitive: | ||
+ | |||
+ | <div class="sub5"> 1 mklfkclvpv vvlllikdss arpglirdfv ggtvgsilep fqilkpkdsy adanshasah</div> | ||
+ | <div class="sub5"> 61 nlggtfslgp vslggglssa sasssasang gglasasska daqaggygyg gsnanaqasa </div> | ||
+ | <div class="sub5">121 sanaqgggyg nggihgiypg qqgvhggnpf lggagsnana naiananaqa naggnngglg</div> | ||
+ | <div class="sub5">181 syggyqqggn ypidsstgpi gnnpflsggh gdgnanaaan anagasaign gggpidvnnp</div> | ||
+ | <div class="sub5">241 flhggaansg agginyqpgn aggiilsekp lglptiypgq hppayldsig spgansnaga</div> | ||
+ | <div class="sub5">301 napcsecgss gatilgyegq glggikesgs sgatilgyeg qglggikesg ssgatilgye</div> | ||
+ | <div class="sub5">361 gqglggikes gssgatilgs ydgqgpsgat ilgdyngqgl ggikessgvt vlgdyegqgl</div> | ||
+ | <div class="sub5">421 ggisgphggh gqaganagan ananagatvg ssggvlggvg dhggyhgyng hdgssglnlg</div> | ||
+ | <div class="sub5">481 gygggsnana qassnalass ggsssatsda lsnahssggs alanssskas angsgsanan</div> | ||
+ | <div class="sub5">541 ahassnassg shglgsktsa ssqasasadt rdmlifs</div> | ||
+ | </h6> | ||
+ | </div> | ||
+ | </div> | ||
+ | |||
+ | <div class="row"> | ||
+ | <div id="subheader" class="small-8 small-centered columns"> | ||
+ | <h6> Note that this sequence is not simply the same sequence repeated multiple times, but instead contains several motifs on the order of 10 - 20 amino acids in length that occur several times. When this sequence was run through the codon optimization program for expression in ''E. coli'' provided by a major DNA synthesis firm, the resulting output could not be synthesized by the very same firm: the "optimized" DNA sequence contained too many recurring short (> 8 nucleotide) DNA sequences to allow for synthesis. | ||
<br></br> | <br></br> | ||
Line 102: | Line 112: | ||
<br></br> | <br></br> | ||
- | |||
</h6> | </h6> | ||
</div> | </div> | ||
</div> | </div> | ||
+ | |||
<!-- ====== Solution: Double Optimizer ====== --> | <!-- ====== Solution: Double Optimizer ====== --> | ||
Line 112: | Line 122: | ||
<div id="subheader" class="small-8 small-centered columns"> | <div id="subheader" class="small-8 small-centered columns"> | ||
<h5><center>Solution: Double Optimizer</h5> | <h5><center>Solution: Double Optimizer</h5> | ||
- | <h6> | + | <h6>DoubleOptimizer is a software tool we have created to optimize codon usage in a gene both to match a given codon usage distribution and to avoid repetition of nucleotide sequences. Given a DNA or amino acid sequence and a desired codon distribution, DoubleOptimizer will produce, within a matter of minutes, an equivalent sequence that has substantially reduced DNA sequence repetition, while also closely matching the desired codon usage. |
- | + | ||
</h6> | </h6> | ||
</div> | </div> | ||
Line 124: | Line 133: | ||
<div id="subheader" class="small-8 small-centered columns"> | <div id="subheader" class="small-8 small-centered columns"> | ||
<h5><center>Availability and Usage</h5> | <h5><center>Availability and Usage</h5> | ||
- | <h6> | + | <h6>DoubleOptimizer may be downloaded <a href = "https://drive.google.com/a/brown.edu/file/d/0B6Q5Eo65G4cPZC1SZWEzbUtrYUU/view?usp=sharing"> here </a> <br></br> |
- | + | ||
- | <br></br> | + | DoubleOptimizer is a command line utility, provided as a Java jar file. It can be invoked from command line on any system with Java installed, with the following syntax: <br></br> |
- | + | java -jar DoubleOptimizer.jar seq.txt codons.txt [Optional flags] <br></br> | |
- | + | ||
- | java -jar DoubleOptimizer.jar seq.txt codons.txt [Optional flags] | + | |
- | + | ||
- | + | ||
+ | where "seq.txt" is a DNA sequence, stored as a plain text file, and "codons.txt" is a file containing the desired codon distribution to match. It should be formatted as plain text, according to the following example template: <br></br> | ||
GCG .36 | GCG .36 | ||
Line 221: | Line 226: | ||
GTA .15 | GTA .15 | ||
- | + | <br></br> | |
- | (Note that the above example is actually the codon usage distribution of | + | (Note that the above example is actually the codon usage distribution of <i>E. coli</i>.) <br></br> |
- | DoubleOptimizer supports non-canonical codon assignments: the amino acid-codon groupings can by specified in whatever way the user wants in the codon distribution file. | + | DoubleOptimizer supports non-canonical codon assignments: the amino acid-codon groupings can by specified in whatever way the user wants in the codon distribution file.<br></br> |
- | When executed, DoubleOptimizer will first display the input sequence with repetitive regions highlighted. It will also give the fraction of the sequence that initially consists of repetitive regions (defined by default as regions of 8 nucleotides or more that occur more than once in the sequence, including as their reverse complement), and a chi-squared value for the goodness-of-fit to the desired codon distribution (lower is better). | + | When executed, DoubleOptimizer will first display the input sequence with repetitive regions highlighted. It will also give the fraction of the sequence that initially consists of repetitive regions (defined by default as regions of 8 nucleotides or more that occur more than once in the sequence, including as their reverse complement), and a chi-squared value for the goodness-of-fit to the desired codon distribution (lower is better).<br></br> |
- | DoubleOptimizer will then compute and display the optimized sequence (By default, it will produce the best sequence it can find after 10 seconds of computation time). Again, repetitive regions will be highlighted, and the same measurements of repetitiveness and codon fit will be given. | + | DoubleOptimizer will then compute and display the optimized sequence (By default, it will produce the best sequence it can find after 10 seconds of computation time). Again, repetitive regions will be highlighted, and the same measurements of repetitiveness and codon fit will be given.<br></br> |
- | The following optional flags may be used to change the program's behavior: | + | The following optional flags may be used to change the program's behavior:<br></br> |
- | *'''''-A''''' | + | *'''''-A''''' <br></br> |
- | This allows for an amino-acid sequence, specified in single-letter code, to be used as input instead of a DNA sequence. The initial sequence statistics displayed will be for a uniform random reverse translation of the given amino acid sequence. | + | This allows for an amino-acid sequence, specified in single-letter code, to be used as input instead of a DNA sequence. The initial sequence statistics displayed will be for a uniform random reverse translation of the given amino acid sequence. <br></br> |
Example: | Example: | ||
- | java -jar DoubleOptimizer.jar aaseq.txt codons.txt -A | + | java -jar DoubleOptimizer.jar aaseq.txt codons.txt -A <br></br> |
+ | *'''''-T##''''' <br></br> | ||
- | + | This allows the user to specify, in seconds, a different run-time for the program other than the default 10 seconds. While 10 seconds should be sufficient to produce a well-optimized result for most genes of moderate length on modern desktop computers, longer times may produce better-optimized results on slower machines or on longer sequences. <br></br> | |
- | + | Example:<br></br> | |
- | + | ||
- | Example: | + | |
java -jar DoubleOptimizer.jar aaseq.txt codons.txt -A -T30 | java -jar DoubleOptimizer.jar aaseq.txt codons.txt -A -T30 | ||
- | + | <br></br> | |
- | + | *'''''-L##''''' <br></br> | |
- | Example: | + | This allows the user to specify a different minimum length for what is considered a repeat, other than the default 8 nucleotides.<br></br> |
+ | |||
+ | Example:<br></br> | ||
- | java -jar DoubleOptimizer.jar seq.txt codons.txt -L7 -T15 | + | java -jar DoubleOptimizer.jar seq.txt codons.txt -L7 -T15 <br></br> |
Revision as of 18:33, 10 October 2014