From 2014.igem.org

iGEM teams are encouraged to record references you use during the course of your research. They should be posted somewhere on your wiki so that judges and other visitors can see how you though about your project and what works inspired you.

It's also important to clearly describe your achievements so that judges will know what you tried to do and where you succeeded. Please write your project page such that what you achieved is easy to distinguish from what you attempted.

INFO TO INCLUDE:

Overall project summary
Project Details
Materials and Methods
The Experiments
Results
Data analysis
Conclusions

Click HERE to return to the Projects page.

Codon Optimization: Engineering a More Useful Gene at the Codon Level

Project Summary

Codons are groups of three nucleotides that specify a single amino acid, which is then added to a growing polypeptide chain during translation. Even though each codon spefifies only one amino acid, some amino acids are coded by multiple codons. It has been demonstrated that the genome of E.coli shows statistical preference for some of these degenerate codons over others, and it is hypothesized that these codons translate more efficiently than non preferred degenerate codons. We constructed synthetic reporter genes entirely from codons hypothesized to be fast or slow,and characterized them in E.coli, demonstrating that...

Why is this important?

Numerous bioproducts are important in our lives. Examples include medicines, fuels, and industrial chemicals. All of these are derived from biological sources, and the ability to engineer their production is vital to a wide variety of industries. Codon optimization is an important area of research because it has the potential to give engineers an additional point of control over protein synthesis, and proteins(a broad class of macromolecules that includes enzymes)are vital components of countless bioproducts.

Our codon optimization research is important for the additional reason that it will help future researchers to develop more comprehensive models of translation. A better understanding of translation is an example of a foundational advance in biology that will lead to faster, more efficient research in many areas of biology. If, for example, our research shows clearly that certain degenerate codons are preferred because they can be translated more efficiently this will allow scientists to search for a mechanism that predicts these effects, and will invite engineers to redesign genes to be translated more efficiently.

Background

Codon optimization refers to the idea that the individual codons of a gene in a specific organism can be changed in order to alter the behavior of that organism. This relies on an understanding of the central dogma of biology, which states that any organism produces proteins by first transcribing genetic material in the form of DNA to RNA, which is then “read” by ribosomes which produce proteins based on the sequence of amino acids in that RNA. The reading of the RNA is done three nucleotides at a time, and these three letter series of nucleotides are called codons. Codons specify to the ribosome which amino acid to add to a growing amino acid chain.

There are 4 nucleotides, thus 43, or 64 codons are possible. Since there are only 20 amino acids, there is redundancy in the codons, that is, some amino acids are specified by multiple codons. There is no ambiguity, however, meaning that each codon specifies only one amino acid. Codons that code for the same amino acid are called degenerate codons, and even though these degenerate codons code for the same amino acid, they do not necessarily lead to the same expression levels of that amino acid.

Our Objectives

1) Find Criteria for Optimizing Genes in E. coli

All coding sequences were designed so that there would be no difference between the amino acid profile of the variant GFP and the original superfolder GFP. This ensured that each gene led to the expression of the same protein.

Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The design of the GFPs using only common and rare codons was based on the data in this table.

Codon Frequency

Modified from Maloy, S., V. Stewart, and R. Taylor. 1996. Genetic analysis of pathogenic bacteria. Cold Spring Harbor Laboratory Press, NY.

To optimize for common degenerate codons, the most frequent codon for a specific amino acid was taken. To optimize for rare degenerate codons, the least frequent codon was taken. For example, if a codon in the original superfolder GFP coded for Phenylalanine, the codons UUU and UUC were available. The frequencies of these were taken from the table (UUU-.51, UUC-.59). For common GFP, UUU was used whenever Phenylalanine was desired, because it had the highest frequency. For rare GFP, UUC was used.

The design of fast and slow GFP genes was based on the data from a recent project where the all the genes (coding DNA sequences) of E. coli were divided by TIR, from lowest to highest and the codon usage profile of each group of genes was statistically analyzed to determine whether a codon is slow or fast. This data is summarized in the following figure.

Codon Frequency in Fast and Slow regions of the Genome

In another recent project, all the genes (coding DNA sequences) of E. coli are divided into five groups based on the naturally occurring TIR, from lowest to highest. Then, the codon usage profile of each group of genes is statistically analyzed to determine whether a codon is slow or fast. A fast codon is defined as one with high correlation between TIR and its frequency. Otherwise, it is a slow codon. It is hypothesized that the groups of CDS with high TIR will hold more “fast” codons, which will lead to higher translation elongation rate and thus higher protein expression, whereas the slow regions will hold more “slow” codons leading to lower expression.

2) Apply These Criteria to a Reporter Gene (GFP)

3) Introduce the Synthetic Genes Optimized Using Our Criteria into E. coli

4) Characterize the GFPs by Measuring Fluorescence of the Cells

5) Compare Protein Expression Levels from the Various Genes

Design Methods

The figure below shows the vector pFTV that was altered using inverse PCR.

Inverse PCR

Bold heading:

@@ Line 166: / Line 166: @@
 <p><strong>1) Find Criteria for Optimizing Genes in <i>E. coli</i></strong></p>
-<p>Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The importance of this is that protein expression in cells is limited either by either translation initiation rate (TIR) or translation elongation rate, and it is theorized that commonly occurring codons will have faster elongation rates than degenerate rare codons. Translation initiation rate can be artificially controlled by varying the strength of the ribosome binding site (RBS), which consists of the genetic sequence that precedes the protein coding sequences (CDS) of a gene. This is accomplished through the use of the RBS calculator, and in previous research was used to steadily increase the RBS strength of a gene, GFP mut3b, the expression of which was then characterized. Unexpectedly, expression level of proteins plateaued even as the RBS strength (and thus TIR) was increased. By using the RBS Calculator to increase the translation initiation rate, we can detect when the plateau occurs, which is called the "maximum translation rate capacity." Since this plateau occurs independently of TIR, it is theorized that it is due solely to translation elongation becoming a rate limiting step</p>
+<p>All coding sequences were designed so that there would be no difference between the amino acid profile of the variant GFP and the original superfolder GFP. This ensured that each gene led to the expression of the same protein.</p>
+<p>Previous researchers have determined through a statistical analysis of the entire genome that some degenerate codons occur more often in protein coding sequences and some are more infrequent. These are referred to as common and rare codons. The design of the GFPs using only common and rare codons was based on the data in this table.</p>
-<p>The figure below shows the table of codon frequency over all coding sequences in the genome of <i>E. coli</i> that was used to create our common and rare GFPs </p>
 <p><strong>Codon Frequency</strong></p>
@@ Line 178: / Line 179: @@
 </p>
-<p>In another recent project, all the genes (coding DNA sequences) of E. coli are divided into five groups based on the naturally occurring TIR, from lowest to highest. Then, the codon usage profile of each group of genes is statistically analyzed to determine whether a codon is slow or fast. A fast codon is defined as one with high correlation between TIR and its frequency. Otherwise, it is a slow codon.  It is hypothesized that the groups of CDS with high TIR will hold more “fast” codons, which will lead to higher translation elongation rate and thus higher protein expression, whereas the slow regions will hold more “slow” codons leading to lower expression.</p>
+<p>Modified from Maloy, S., V. Stewart, and R. Taylor. 1996. Genetic analysis of pathogenic bacteria. Cold Spring Harbor Laboratory Press, NY.</p>
-<p>The figure below shows the vector pFTV that was altered using inverse PCR. </p>
+<p>To optimize for common degenerate codons, the most frequent codon for a specific amino acid was taken. To optimize for rare degenerate codons, the least frequent codon was taken. For example, if a codon in the original superfolder GFP coded for Phenylalanine, the codons UUU and UUC were available.  The frequencies of these were taken from the table (UUU-.51, UUC-.59). For common GFP, UUU was used whenever Phenylalanine was desired, because it had the highest frequency. For rare GFP, UUC was used.</p>
-<p><strong>Inverse PCR</strong></p>
+<p>The design of fast and slow GFP genes was based on the data from a recent project where the all the genes (coding DNA sequences) of E. coli were divided by TIR, from lowest to highest and the codon usage profile of each group of genes was statistically analyzed to determine whether a codon is slow or fast. This data is summarized in the following figure.</p>
+<p><strong>Codon Frequency in Fast and Slow regions of the Genome</strong></p>
 <p>
 <figure>
-   <p><image src="https://static.igem.org/mediawiki/2014/d/d7/Slide3.JPG" width="575px"></p>
+   <p><image src="https://static.igem.org/mediawiki/2014/6/6e/Fast_and_Slow_Codons.png" width="575px"></p>
-   <p><fig caption>Caption</figcaption></p>
+   <p><fig caption>Fast codons show a positive correlation between frequency and TIR, slow codons show a negative correlation</figcaption></p>
 </figure>
 </p>
+<p>In another recent project, all the genes (coding DNA sequences) of E. coli are divided into five groups based on the naturally occurring TIR, from lowest to highest. Then, the codon usage profile of each group of genes is statistically analyzed to determine whether a codon is slow or fast. A fast codon is defined as one with high correlation between TIR and its frequency. Otherwise, it is a slow codon.  It is hypothesized that the groups of CDS with high TIR will hold more “fast” codons, which will lead to higher translation elongation rate and thus higher protein expression, whereas the slow regions will hold more “slow” codons leading to lower expression.</p>

Team:Penn State/CodonOptimization

From 2014.igem.org

Revision as of 19:12, 15 July 2014

WELCOME TO PENN STATE iGEM 2014!

CODON OPTIMIZATION PROJECT

Project Description

references