Team:Tsinghua-A/Modeling

From 2014.igem.org

(Difference between revisions)
Line 44: Line 44:
    <p><B>Overview</B><br>
    <p><B>Overview</B><br>
In order to lower down the repeatability rate of codons, we use intelligent optimization algorithm.  
In order to lower down the repeatability rate of codons, we use intelligent optimization algorithm.  
-
We exploit amino acid degeneracy and alternate nucleotides to reduce the repetition rate of bases of TALE DNA sequence. (See Figure 1) Then the optimized TALE sequence will be tested in wet lab to verify our conjecture whether TALE DNA sequence of lower repeatability rate works at higher efficiency.[1]<br><br>
+
We exploit amino acid degeneracy and alternate nucleotides to reduce the repetition rate of bases of TALE DNA sequence. (See Figure 1) Then the optimized TALE sequence will be tested in wet lab to verify our conjecture whether TALE DNA sequence of lower repeatability rate works at higher efficiency.<br><br>
<img src="https://static.igem.org/mediawiki/2014/4/4a/Tsinghua-A-condon.jpg" alt="Title"><br>
<img src="https://static.igem.org/mediawiki/2014/4/4a/Tsinghua-A-condon.jpg" alt="Title"><br>
Figure 1. RNA codon Table<br><br>
Figure 1. RNA codon Table<br><br>
 +
For instance, we may change UUC into UUU to make one of the repeats different from others while the amino acid(PhenylalaninePhe) is identical.<br><br>
 +
By this way, we use two types of intelligent optimization algorithm to optimize TALE DNA sequence.<br><br>
-
<B>What can TAL effectors do and how do they function?</B><br>
+
<B>Genetic Algorithm(GA)</B><br>
-
There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. [2]As a result, TAL effectors have attracted great interest as DNA targeting tools.<br><br>
+
<B>-Introduction of Genetic Algorithm</B><br>
-
Their targeting specificity is determined by a central domain of tandem, 33–35 amino acid repeats, followed by a single truncated repeat of 20 amino acids. The majority of naturally occurring TAL effectors examined have between 12 and 27 full repeats. A polymorphic pair of adjacent residues at positions 12 and 13 in each repeat, the ‘repeat-variable di-residue’ (RVD), specifies the target, one RVD to one nucleotide, with the four most common RVDs each preferentially associating with one of the four bases.[3][4]
+
Genetic Algorithm is a search heuristic that mimics the process of natural selection. The heuristic is routinely used to generate useful solutions to optimization and search problems.[1]<br>
-
<img src="https://static.igem.org/mediawiki/2014/d/d7/Tsinghua-A-overview.jpg" alt="Title"><br>
+
GA has lots of applications in many fields, such as in bioinformatics, using GA to optimize DNA successfully made the sequences have better physic-chemical properties for PCR. <br><br>
-
Figure 1. Structure of a naturally occurring TAL effector[5]<br><br>
+
 
-
+
<B>-Our strategy</B><br>
-
<B>What’s the weakness of TAL effector?</B><br>
+
<B> Initialization</B><br> Create a population of hundreds of TALE sequences.<br>
-
This simple code between amino acids in TAL effectors and DNA bases in their target sites might be useful for protein engineering applications. Numerous groups have designed artificial TAL effectors capable of recognizing new DNA sequences in a variety of experimental systems. It has been reported that TAL effectors target genes efficiently in many eukaryocytic cells like mammalian cells and yeasts. However, the former experiments with E. coli indicate that TAL effectors don’t work well in E. coli cells.<br><br><br>
+
<B> Mutation</B><br> Each sequence changes a base randomly under the premise that 1. Amino acids sequence remain unchanged 2. To prevent TALE sequence from being cut, no restriction enzyme site we exploit in experiment is allowed to be created 3. Mutation should not take place on the overhang of each repeat.<br>
-
<b>References</b><br>
+
<B> Crossover</B><br> In GA, crossover is a simulation of the process of synapsis. We randomly choose a point of amino acid, and two sequences exchange their parts from the point we choose to one of the ends. (此处应有图)<br>
-
[1] Bogdanove AJ, Schornack S, Lahaye T. TAL effectors: finding plant genes for disease and defense. Curr. Opin. Plant Biol. 2010; 13: 394-401.<br>
+
<B> Fitness Scoring</B><br>  
-
[2] Boch J, Bonas U. Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu. Rev. Phytopathol. 2010; 48: 419-436.<br>
+
We take two main factors into consideration:<br>
-
[3] Moscou MJ, Bogdanove AJ. A simple cipher governs DNA recognition by TAL effectors. Science. 2009; 326: 1501.<br>
+
1.Repeatability rate of the sequence<br>
-
[4] Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U. Breaking the code of DNA binding specificity of TAL-type III effectors. Science 2009; 326:1509-1512.<br>
+
To judge the repeatability, we compare the changed sequence with the original one and count the Hamming distance between them. When a restriction enzyme site sequence occurs on the mutated TALE sequence, there will be a penalty.<br>  
-
[5] Tomas Cermak, Adam J. Bogdanove, and Daniel F. Voytas. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucl. Acids Res. 2011; 39: 7879-7879
+
2.Codon Usage<br>
 +
When the sequence contains too many rare codons, which means the number of the homologous tRNA is extremely low in E.coli, the whole sequence can hardly express. In order to avoid the appearance of rare codons, each occurrence of rare codons will lead to penalty.<br>
 +
<B> Selection</B><br>
 +
In each generation we sort the 200 sequences according to the scores of each single sequence. And we terminate the half of lower scores, the rest of them will have better fitness. Repeat the process of mutation and crossover, the average score of each generation will increase. (See Results.)<br>
 +
We repeat the process for 600 generations.<br><br>
 +
 
 +
<B>Simulated Annealing(SA)</B><br>  
 +
<B>-Introduction of Simulated Annealing</B><br>
 +
Simulated annealing is a simple and general algorithm for finding global minima. It operates by simulating the cooling of a (usually fictitious) physical system whose possible energies correspond to the values of the objective function being minimized. The analogy works because physical systems occupy only the states with lowest energy as the temperature is lowered to absolute zero.<br><br>
 +
 
 +
<B>-Our Strategy</B><br>
 +
<B>Mutation</B><br> We randomly choose 5 points on the sequence to mutate. To each single point, the process is the same with our computational mutation in genetic algorithm.<br>  
 +
<B>Scoring</B><br> The same as fitness scoring in genetic algorithm.<br>
 +
<B>Parameters</B><br> The initial temperature T is 10000.0, after each generation, the temperature reduces to 99% of the former temperature. If the new generation has higher score, it will be accepted. Otherwise it will be accepted at a probability P(A)<br>
 +
P(A)=0.001*e^(-(t^*-t)/T)<br>
 +
t^*---- score of the new generation t---- score of the new generation<br><br>
 +
 
 +
<B>References</B><br>
 +
[1]Wikipedia http://en.wikipedia.org/wiki/Genetic_algorithm<br><br>
   
   
</p><br />
</p><br />
Line 82: Line 102:
    <div class="span10">
    <div class="span10">
   
   
-
    <h2><b>Hypothesis</b></h2>
+
    <h2><b>Results</b></h2>
-
    <p>It has been reported that TAL effectors target genes efficiently in many eukaryocytes like mammalian cells and yeasts. However, the former experiments with E. coli indicate that TAL effectors don’t work well in E. coli cells. There are a variety of factors can possibly be attributed to its low efficiency, and homologous recombination is a highly possible one.<br><br>
+
    <p>
-
Some proteins originated from prokaryocytes, with their sequences paralleling with DNA sequences of E. coli, their expression might be hampered because of homologous recombination. By parity of reasoning, we propose a hypothesis that the inefficiency of TAL effector expression is caused by its sequence resemblance with E. coli genes.<br><br>
+
-
Our solution is based on the codon usage bias, which refers to differences in the frequency of occurrence of synonymous codons in coding DNA.[1] The translational efficiency of heterologous genes can often be improved by optimizing synonymous codon usage to better match the host organism.[2]<br>
+
-
<img src="https://static.igem.org/mediawiki/2014/4/4a/Tsinghua-A-condon.jpg" alt="Title"><br>
+
-
Figure 1. RNA codon table[3]<br><br>
+
-
 
+
-
<b>References</b><br>
+
-
[1] Susanta K. Behura* and David W. Severson, Codon usage bias: causative factors, quantification methods and genome-wide patterns: with emphasis on insect genomes. Biological Review. 2012; 88: 49-61<br>
+
-
[2] Lanza, Amanda M.; Curran, Kathleen A.; Rey, Lindsey G.; et al. A condition-specific codon optimization approach for improved heterologous gene expression in Saccharomyces cerevisiae. BMC Systems Biology. 2014; 8: 33. <br>
+
-
[3] http://bioinfo.bisr.res.in/cgi-bin/project/crat/theory_codon_restriction.cgi
+

Revision as of 19:35, 17 October 2014

1

Algorithms

Overview
In order to lower down the repeatability rate of codons, we use intelligent optimization algorithm. We exploit amino acid degeneracy and alternate nucleotides to reduce the repetition rate of bases of TALE DNA sequence. (See Figure 1) Then the optimized TALE sequence will be tested in wet lab to verify our conjecture whether TALE DNA sequence of lower repeatability rate works at higher efficiency.

Title
Figure 1. RNA codon Table

For instance, we may change UUC into UUU to make one of the repeats different from others while the amino acid(PhenylalaninePhe) is identical.

By this way, we use two types of intelligent optimization algorithm to optimize TALE DNA sequence.

Genetic Algorithm(GA)
-Introduction of Genetic Algorithm
Genetic Algorithm is a search heuristic that mimics the process of natural selection. The heuristic is routinely used to generate useful solutions to optimization and search problems.[1]
GA has lots of applications in many fields, such as in bioinformatics, using GA to optimize DNA successfully made the sequences have better physic-chemical properties for PCR.

-Our strategy
Initialization
Create a population of hundreds of TALE sequences.
Mutation
Each sequence changes a base randomly under the premise that 1. Amino acids sequence remain unchanged 2. To prevent TALE sequence from being cut, no restriction enzyme site we exploit in experiment is allowed to be created 3. Mutation should not take place on the overhang of each repeat.
Crossover
In GA, crossover is a simulation of the process of synapsis. We randomly choose a point of amino acid, and two sequences exchange their parts from the point we choose to one of the ends. (此处应有图)
Fitness Scoring
We take two main factors into consideration:
1.Repeatability rate of the sequence
To judge the repeatability, we compare the changed sequence with the original one and count the Hamming distance between them. When a restriction enzyme site sequence occurs on the mutated TALE sequence, there will be a penalty.
2.Codon Usage
When the sequence contains too many rare codons, which means the number of the homologous tRNA is extremely low in E.coli, the whole sequence can hardly express. In order to avoid the appearance of rare codons, each occurrence of rare codons will lead to penalty.
Selection
In each generation we sort the 200 sequences according to the scores of each single sequence. And we terminate the half of lower scores, the rest of them will have better fitness. Repeat the process of mutation and crossover, the average score of each generation will increase. (See Results.)
We repeat the process for 600 generations.

Simulated Annealing(SA)
-Introduction of Simulated Annealing
Simulated annealing is a simple and general algorithm for finding global minima. It operates by simulating the cooling of a (usually fictitious) physical system whose possible energies correspond to the values of the objective function being minimized. The analogy works because physical systems occupy only the states with lowest energy as the temperature is lowered to absolute zero.

-Our Strategy
Mutation
We randomly choose 5 points on the sequence to mutate. To each single point, the process is the same with our computational mutation in genetic algorithm.
Scoring
The same as fitness scoring in genetic algorithm.
Parameters
The initial temperature T is 10000.0, after each generation, the temperature reduces to 99% of the former temperature. If the new generation has higher score, it will be accepted. Otherwise it will be accepted at a probability P(A)
P(A)=0.001*e^(-(t^*-t)/T)
t^*---- score of the new generation t---- score of the new generation

References
[1]Wikipedia http://en.wikipedia.org/wiki/Genetic_algorithm



2

Results



3

TALE Assembly

The TALE assembly strategy uses the Golden Gate cloning method, which is based on the ability of type IIS enzymes to cleave outside of their recognition site. When type IIS recognition sites are placed to the far 5’ and 3’ end of any DNA fragment in inverse orientation, they are removed in the cleavage process, allowing two DNA fragments flanked by compatible sequence overhangs, termed fusion sites, to be ligated seamlessly. Since type IIS fusion sites can be designed to have different sequences, directional assembly of multiple DNA fragments is feasible. Using this strategy, DNA fragments can be assembled from undigested input plasmids in a one-pot reaction with high efficiency.

We chose the native TALE AvrBs3 as a scaffold for customized assembly of TALE constructs. The central DNA binding domain of AvrBs3 is formed by 17.5 tandemly arranged 34 amino acid repeats, with the last half repeat showing similarity to only the first 20 amino acids of a full repeat. To reduce the risk of recombination events between the 17.5 highly homologous repeat sequences which is mentioned in the hypothesis part, we codon-optimized AvrBs3 applying the codon usage.

In a single Golden Gate cloning reaction, cloning efficiency is significantly reduced for assembly of 17 repeat modules. Therefore, we split the assembly in two successive steps. In the first cloning step, 10 repeats were assembled in one vector. The preassembly vectors confer spectinomycin resistance and encode a lacZ-α fragment for blue/white selection. On both sides of the lacZ-α fragment a type IIS recognition sequence, BsaI, was positioned. Similarly, 11~17 repeats and NG-last-repeat were respectively ligated and inserted into another vector. After preassembly of the 10 and 7 and last repeats using BsaI, the intermediate blocks were released via Esp3I and cloned into the final assembly vector (modified pTAL1). As is explained in the backbone part, we constructed a backbone with constitutive promoter which can express under normal condition, and another with a tetracycline-induced promoter, which is expressed with tet. Modified pTAL1 confers AmpR, and allows plasmid replication in E.coli. The vector pTAL1 also contains all elements of the final TALE expression construct, including TALE N’ and C’ arms, replication origin, etc., but a lacZ-α in between the left and right arms.

During the construction of wild-type control plasmid, we used the modified modules provided by our lab. In order to confirm our hypothesis, the designer TALEs were oligo-synthesized according to the results given by optimizing algorithm. We reserved the fusion sites for golden gate reaction, and broke each repeats into 2 parts. After annealing in PCR amplifier, the linear sequence of the repeats can be added to the golden gate reaction and be ligated with the primary vector and continue further ligation to accomplish the final construction.

Title Reference
http://www.ncbi.nlm.nih.gov/nuccore/NG_034463.1


More

4

TALE Expression

Background – pTAL1 vector
Based on the fact that Golden Gate is an effective way to assembly TALE (Transcription activator-like effectors) and various eukaryotic expression systems have been established but few in prokaryotic systems, we are determined to construct such efficient expression system in Esherichia coli so that we can test our brilliant idea. Through referring to numbers of paper, we find that most scientists choose to construct stable cell line via homologous arm such as attL1 and attL2 or they just introduce exogenous TALE[1]. Considering the principle of Golden Gate Assembly, we only need to reconstruct the final vector pTAL1 (figure 1) to solve this problem. The vector pTAL1 contains TALE N-terminal, TALE C-terminal, lacZ for blue white scanning and attL1, attL2 homologous arm. However, it lacks necessary elements for prokaryotic creature such as promoter, RBS and terminator. Here comes to our story of establish a TALE expression system in prokaryotic creature.

Title
Figure 1. The original vector pTAL1

Constitutive pTAL
Wisely, we choose 3A assembly (http://parts.igem.org/Help:Assembly/3A_Assembly) to construct our expression system. Firstly, we design forward and reverse primers with extension on which contains EcoRI, XbaI and SpeI, PstI restriction enzyme sites to get PCR prodcuts of pTAL. Then through naïve enzyme digestion and liagtion we can ligate pTAL with terminator, promoter and RBS one by one. And finally, we can easily get our constitutive TALE expression vector (Figure 2). And we also submit this expression vector as K1311003 (http://parts.igem.org/Part:BBa_K1311003:Design) in part.igem.org.

Regulative pTAL
Similar to the method of constitutive construction, we make use of the ligated pTAL with terminator to continue our regulative pTAL construction. After browsing on the igem parts website (http://parts.igem.org/Main_Page), we find that there is no ideal regulatory parts that can be directly applied. We need to make use of some parts to get our ideal composite regulatory parts. Based on parts C0040 (http://parts.igem.org/Part:BBa_C0040), we added promoter and RBS (K081005, http://parts.igem.org/Part:BBa_K081005), terminator (B0015, http://parts.igem.org/Part:BBa_B0015), pTet (TetR repressible promoter, R0040, http://parts.igem.org/Part:BBa_R0040) and RBS one by one via 3A assembly. Similarly, we insert this large fragment into the upstream of the ligated pTAL with terminator (Figure 2). Eventually, we successfully reconstruct regulative pTAL and offer 3 our own parts this year. One is regulative pTAL (K1311004, http://parts.igem.org/Part:BBa_K1311004:Design); one is K1311005 (http://parts.igem.org/Part:BBa_K1311005) and the other is K1311006 (http://parts.igem.org/Part:BBa_K1311006).

Title



5

Report System

We construct a report system so as to test the reliability and efficiency of our ‘Marvelous TALE’. In this section, we test the TALE’s DNA binding ability and report it with a common report gene ‘RFP’. We attempt to put the target of TALE’s DNA binding target sequence inside the expression cassette of report gene and binding TALE can disrupt the express of report gene. We use iGEM standard parts to build our report system.

We designed a standard iGEM part BBa_K1311007 to complete all the tasks. This part contains
Promoter (J23102)-TALE binding site (repeats three times)-RBS-LacI coding sequence-Terminator(B0015)-LacI Regulative Promoter(R0010)-RBS(B0034)-mRFP1(E1010)-Terminator(B0015)
Title
This part can convert the binding ability of TALE protein to its target DNA sequence to an easier available parameter, the florescent intensity of RFP. When the TALE protein is expressed, the TALE make binds to its target, which may interrupt the transcription of LacI. The lack of repressors may lead to the expression of RFP. So the stronger florescent intensity means the better binding ability of TALE protein. In this part, the target of TALE recognition site is chosen to be the 18bp sequence (ACCTCATCAGGAACATGTT).

Our Circuit Design
Title
This parts can convert the binding ability of TALE protein to its target DNA sequence to a easier available parameter, the florescent intensity of RFP. When the TALE protein is expressed, the TALE make binds to its target, which may interrupt the transcription of LacI. The lack of repressors may lead to the expression of RFP. So the stronger florescent intensity means the better binding ability of TALE protein.

Validation
We change the normal RBS sequence in the LacI coding sequence into an RBS containing three tandem TALE binding site sequences. So we have to validate that the LacI protein can express normally and can normally inhibit the expression of RFP.
We transformed this plasmid in to E.coli DH5α with electroporation and chemical transformation. We spread the plate and culture it in 37 degrees Centigrade for more than 15 hours until small colonies can be seen in the plate. At this moment, the colonies might look red for the sake of the delay of expression of LacI. We picked colonies into the 10mL tubes and added 5mL LB broth with Chl antibiotics.
After six hours’ shaking at 37 degrees Centigrade and 220 rpm, we got the tubes out and double, four times, eight times diluted. (40μL of 0.1M IPTG was added as the positive control group) After another shaking for 12h, florescence of the bacteria was evaluated with an enzyme-labeled instrument and OD600 was tested with spectrometry.

Picture of our tubes

Title
(The two red ones on the back are the positive control groups, others are not red)