Team:Penn State/CodonOptimization
From 2014.igem.org
Line 166: | Line 166: | ||
<p><strong>1) Find Criteria for Optimizing Genes in <i>E. coli</i></strong></p> | <p><strong>1) Find Criteria for Optimizing Genes in <i>E. coli</i></strong></p> | ||
- | <p>Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The | + | <p>All coding sequences were designed so that there would be no difference between the amino acid profile of the variant GFP and the original superfolder GFP. This ensured that each gene led to the expression of the same protein.</p> |
+ | |||
+ | <p>Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The design of the GFPs using only common and rare codons was based on the data in this table.</p> | ||
- | |||
<p><strong>Codon Frequency</strong></p> | <p><strong>Codon Frequency</strong></p> | ||
Line 178: | Line 179: | ||
</p> | </p> | ||
- | <p> | + | <p>Modified from Maloy, S., V. Stewart, and R. Taylor. 1996. Genetic analysis of pathogenic bacteria. Cold Spring Harbor Laboratory Press, NY.</p> |
- | <p>The | + | <p>To optimize for common degenerate codons, the most frequent codon for a specific amino acid was taken. To optimize for rare degenerate codons, the least frequent codon was taken. For example, if a codon in the original superfolder GFP coded for Phenylalanine, the codons UUU and UUC were available. The frequencies of these were taken from the table (UUU-.51, UUC-.59). For common GFP, UUU was used whenever Phenylalanine was desired, because it had the highest frequency. For rare GFP, UUC was used.</p> |
- | <p><strong> | + | |
+ | <p>The design of fast and slow GFP genes was based on the data from a recent project where the all the genes (coding DNA sequences) of E. coli were divided by TIR, from lowest to highest and the codon usage profile of each group of genes was statistically analyzed to determine whether a codon is slow or fast. This data is summarized in the following figure.</p> | ||
+ | |||
+ | <p><strong>Codon Frequency in Fast and Slow regions of the Genome</strong></p> | ||
<p> | <p> | ||
<figure> | <figure> | ||
- | <p><image src="https://static.igem.org/mediawiki/2014/ | + | <p><image src="https://static.igem.org/mediawiki/2014/6/6e/Fast_and_Slow_Codons.png" width="575px"></p> |
- | <p><fig caption> | + | <p><fig caption>Fast codons show a positive correlation between frequency and TIR, slow codons show a negative correlation</figcaption></p> |
</figure> | </figure> | ||
</p> | </p> | ||
+ | |||
+ | <p>In another recent project, all the genes (coding DNA sequences) of E. coli are divided into five groups based on the naturally occurring TIR, from lowest to highest. Then, the codon usage profile of each group of genes is statistically analyzed to determine whether a codon is slow or fast. A fast codon is defined as one with high correlation between TIR and its frequency. Otherwise, it is a slow codon. It is hypothesized that the groups of CDS with high TIR will hold more “fast” codons, which will lead to higher translation elongation rate and thus higher protein expression, whereas the slow regions will hold more “slow” codons leading to lower expression.</p> | ||
+ | |||
+ | |||
Revision as of 19:12, 15 July 2014
WELCOME TO PENN STATE iGEM 2014!(Page under construction) |
|||||||||||||
| |||||||||||||
CODON OPTIMIZATION PROJECT | |||||||||||||
Click HERE to return to the Projects page. Project Descriptiontell about project - give background, essentially write the abstract. (1-2 paragraphs) referencesiGEM teams are encouraged to record references you use during the course of your research. They should be posted somewhere on your wiki so that judges and other visitors can see how you though about your project and what works inspired you. It's also important to clearly describe your achievements so that judges will know what you tried to do and where you succeeded. Please write your project page such that what you achieved is easy to distinguish from what you attempted. INFO TO INCLUDE:
|
Click HERE to return to the Projects page.
Codon Optimization: Engineering a More Useful Gene at the Codon Level
Project Summary
Codons are groups of three nucleotides that specify a single amino acid, which is then added to a growing polypeptide chain during translation. Even though each codon spefifies only one amino acid, some amino acids are coded by multiple codons. It has been demonstrated that the genome of E.coli shows statistical preference for some of these degenerate codons over others, and it is hypothesized that these codons translate more efficiently than non preferred degenerate codons. We constructed synthetic reporter genes entirely from codons hypothesized to be fast or slow,and characterized them in E.coli, demonstrating that...
Why is this important?
Numerous bioproducts are important in our lives. Examples include medicines, fuels, and industrial chemicals. All of these are derived from biological sources, and the ability to engineer their production is vital to a wide variety of industries. Codon optimization is an important area of research because it has the potential to give engineers an additional point of control over protein synthesis, and proteins(a broad class of macromolecules that includes enzymes)are vital components of countless bioproducts.
Our codon optimization research is important for the additional reason that it will help future researchers to develop more comprehensive models of translation. A better understanding of translation is an example of a foundational advance in biology that will lead to faster, more efficient research in many areas of biology. If, for example, our research shows clearly that certain degenerate codons are preferred because they can be translated more efficiently this will allow scientists to search for a mechanism that predicts these effects, and will invite engineers to redesign genes to be translated more efficiently.
Background
Codon optimization refers to the idea that the individual codons of a gene in a specific organism can be changed in order to alter the behavior of that organism. This relies on an understanding of the central dogma of biology, which states that any organism produces proteins by first transcribing genetic material in the form of DNA to RNA, which is then “read” by ribosomes which produce proteins based on the sequence of amino acids in that RNA. The reading of the RNA is done three nucleotides at a time, and these three letter series of nucleotides are called codons. Codons specify to the ribosome which amino acid to add to a growing amino acid chain.
There are 4 nucleotides, thus 43, or 64 codons are possible. Since there are only 20 amino acids, there is redundancy in the codons, that is, some amino acids are specified by multiple codons. There is no ambiguity, however, meaning that each codon specifies only one amino acid. Codons that code for the same amino acid are called degenerate codons, and even though these degenerate codons code for the same amino acid, they do not necessarily lead to the same expression levels of that amino acid.
Our Objectives
1) Find Criteria for Optimizing Genes in E. coli
All coding sequences were designed so that there would be no difference between the amino acid profile of the variant GFP and the original superfolder GFP. This ensured that each gene led to the expression of the same protein.
Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The design of the GFPs using only common and rare codons was based on the data in this table.
Codon Frequency
Modified from Maloy, S., V. Stewart, and R. Taylor. 1996. Genetic analysis of pathogenic bacteria. Cold Spring Harbor Laboratory Press, NY.
To optimize for common degenerate codons, the most frequent codon for a specific amino acid was taken. To optimize for rare degenerate codons, the least frequent codon was taken. For example, if a codon in the original superfolder GFP coded for Phenylalanine, the codons UUU and UUC were available. The frequencies of these were taken from the table (UUU-.51, UUC-.59). For common GFP, UUU was used whenever Phenylalanine was desired, because it had the highest frequency. For rare GFP, UUC was used.
The design of fast and slow GFP genes was based on the data from a recent project where the all the genes (coding DNA sequences) of E. coli were divided by TIR, from lowest to highest and the codon usage profile of each group of genes was statistically analyzed to determine whether a codon is slow or fast. This data is summarized in the following figure.
Codon Frequency in Fast and Slow regions of the Genome
In another recent project, all the genes (coding DNA sequences) of E. coli are divided into five groups based on the naturally occurring TIR, from lowest to highest. Then, the codon usage profile of each group of genes is statistically analyzed to determine whether a codon is slow or fast. A fast codon is defined as one with high correlation between TIR and its frequency. Otherwise, it is a slow codon. It is hypothesized that the groups of CDS with high TIR will hold more “fast” codons, which will lead to higher translation elongation rate and thus higher protein expression, whereas the slow regions will hold more “slow” codons leading to lower expression.
2) Apply These Criteria to a Reporter Gene (GFP)
3) Introduce the Synthetic Genes Optimized Using Our Criteria into E. coli
4) Characterize the GFPs by Measuring Fluorescence of the Cells
5) Compare Protein Expression Levels from the Various Genes
Design Methods
The figure below shows the vector pFTV that was altered using inverse PCR.
Inverse PCR
Bold heading: