Team:SJTU-BioX-Shanghai/Part3 TAL Improvement

From 2014.igem.org

(Difference between revisions)
 
(60 intermediate revisions not shown)
Line 3: Line 3:
{{Template:Team:SJTU-BioX-Shanghai/top-nav}}
{{Template:Team:SJTU-BioX-Shanghai/top-nav}}
{{Template:Team:SJTU-BioX-Shanghai/article}}
{{Template:Team:SJTU-BioX-Shanghai/article}}
 +
{{Template:Team:SJTU-BioX-Shanghai/preview}}
<html>
<html>
<style type="text/css">
<style type="text/css">
   .header_logo{ background-image:url("/wiki/images/d/db/SJTU14_gearbox.png");}
   .header_logo{ background-image:url("/wiki/images/d/db/SJTU14_gearbox.png");}
 +
 +
.projtile {
 +
    margin-right:1.167%;
 +
    margin-left:1.167%;
 +
    width:31%;}
 +
 +
#passage{
 +
    width:100%;}
 +
#passage p{
 +
    padding:4%;}
 +
 +
#subtitleyjn1{
 +
    margin-left:20%;}
 +
 +
#subtitleyjn2{
 +
    margin-left:40%;}
 +
</style>
</style>
<div class="content">
<div class="content">
 +
<div class="jiao" >
 +
 +
<div class="projtile_only">
 +
    <h2 id="subtitleyjn1">TAL Improvement</h2></br>
 +
    <h2 id="subtitleyjn2">——More Effective Golden Gate Cloning </h2>
 +
<center><p>This year our team not only focused on our project and bioparts but also improved several existing parts designed by iGEM12_Freiburg. When we used their parts and followed their protocols, we found that it was hard for us to repeat their success. There were a lot of difficulties on our way to constructing a TALE. After discussion and thinking, our team decided to improve the existing TAL parts by finding out new and better sticky ends. Moreover, our method can be used in border ways when people want to connect some components by Golden Gate method.</p></center>
 +
</div>
 +
 
 +
    <div class="projtile">
 +
  <a href="#dianweidian3" title="Why we want to improve it?">
 +
    <center><h2>Why improve it?</h2></center></a>
 +
    </div>
 +
     
 +
    <div class="projtile">
 +
  <a href="#dianweidian4"  title="How do we connect certain monomers?">
 +
    <center> <h2>How to connect monomer?</h2></center></a>
 +
    </div>
 +
     
 +
 +
    <div class="projtile">
 +
<a href="#dianweidian5" title="Why not other sticky ends?">
 +
    <center><h2>Why not other sticky ends?</h2></center></a>
 +
    </div>
 +
 +
    <div class="projtile">
 +
<a href="#dianweidian6" title="How to improve the Golden Gate sticky ends? A big Table!">
 +
    <center><h2 style="LINE-HEIGHT:30px; margin-left:7%; margin-right:7% ;" >How to improve the Golden Gate sticky ends?</h2></center></a>
 +
    </div>
 +
 +
    <div class="projtile">
 +
<a href="#dianweidian7" title="Best choice for seven sticky ends on TALE protein">
 +
    <center><h2 style="LINE-HEIGHT:30px; margin-left:7%; margin-right:7% ;">Best choice for seven sticky ends on TALE protein</h2></center></a>
 +
    </div>
 +
 +
    <div class="projtile" id="dianweidian3" >
 +
<a href="#dianweidian8" title="Reconstruct DNA Sequence">
 +
    <center><h2>Reconstruct DNA sequences</h2></br></center></a>
 +
    </div>
 +
 +
 +
</div>
 +
 +
<div style="clear:both;"></div>
 +
 +
<article class="post__article">
<article class="post__article">
-
<h2>Why we want to improve it?</h2><br>
+
<h2 >Why we want to improve it?</h2>
-
<h3>Whether the Freiburg's design is efficient or not</h3>
+
<h3>Whether the Freiburg's design is efficient or not</h3>
-
<p>According to the experimental record of Freiburg, the success rate is higher than 95%(32/33). However, this result, to some degree, lacks statistical significance.</p>
+
<p>According to the experimental record of Freiburg, the success rate is higher than 95%(32/33). However, this result, to some degree, lacks statistical significance.</p>
-
<p>In the result section, they emphasize that there is a light band at 1200bp, which they believe could indicate that the Golden Gate connection works well. However, after conducting several experiments by ourselves, we find that the key point to indicate whether Golden Gate connection works is not the band at 1200bp. If the band is not clear and specific in the gel, it indicates the experiment doesn’t go well. We can easily find several light bands under the band of 1200bp. Moreover, the second light band is somewhat lighter than the band at 1200bp. Although the Freiburg can explain the results with the repeatability of the TALE sequence, we suppose that the possibility of the mismatch of the sticky ends still can’t be excluded. Frankly speaking, we try to believe that they really made it, but if the success cannot be repeated, there must be something wrong with their system. You can view <a href="https://2012.igem.org/Team:Freiburg/Project/Experiments">Detail information</a> in iGEM2012 Freiburg wiki.</p>
+
<p>In the result section, they emphasize that there is a light band at 1200bp, which they believe could indicate that the Golden Gate connection works well. However, after conducting several experiments by ourselves, we found that the key point to indicate whether Golden Gate connection works is not the band at 1200bp. If the band is not clear and specific in the gel, it indicates the experiment doesn’t go well. We can easily find several light bands under the band of 1200bp. Moreover, the second light band is somewhat lighter than the band at 1200bp. Although the Freiburg can explain the results with the repeatability of the TALE sequence, we suppose that the possibility of the mismatch of the sticky ends still can’t be excluded. Frankly speaking, we try to believe that they really made it, but if the success cannot be repeated, there must be something wrong with their system. You can view <a href="https://2012.igem.org/Team:Freiburg/Project/Experiments">Detail information</a> in iGEM2012 Freiburg wiki.</p>
-
<center><img  alt="Freiburg gel result" src="https://static.igem.org/mediawiki/2014/5/57/Freiburg_result_picture.png"width=400px vspace=20px></img></center>
+
<center><img  alt="Freiburg gel result" src="https://static.igem.org/mediawiki/2014/5/57/Freiburg_result_picture.png"width=400px vspace=20px></img></center>
-
<p><center><strong><span style="font-size:22px;color:blue">The protocol we take to connect the parts of TALE</span></strong></center></p>
+
 
 +
<center><small><strong> Figure 1.4.1 2012 Freiburg gel result</strong></small></center>
 +
<h3>The protocols we took to connect the parts of TALE</h3>
 +
 
 +
<p>1. Freiburg's protocol</p>
 +
<p>2. Restriction enzyme digestion of plasmid and TAL direpeats, followed by gel extraction. By the mole ratio, plasmid to TALE is 1 to 5 and TALE to TALE is 1 to 1. Ligation with T4 ligase in 22 ℃ over night.</p>
 +
<p>3. With the same ratio of plasmid and TALE direpeats, add the TALE direpeats one by one and connect them in 22 ℃, 30 minutes.</p>
 +
<p>4. Every two parts connect at one time, and try to make three intermediates of 400bp. And then mix the plasmids to make the complete TALE.</p>
 +
<p>5. The same ratio and connect following the program of 22℃ 2min, 40℃ 30s, 25 repeats.</p>
 +
 +
<h3>The motivation to improve 2012 Freiburg’s parts</h3>
 +
 
 +
<p id="dianweidian4">Unfortunately, all of our attempts failed. We didn’t manage to make a complete TALE, or even make two of them together. However, what is important for us is that when we tried the 5th protocol, we noticed an unexpected result. When we analyzed the sequence result, we found that our left adaptor, 1st part and right adaptor connected together. Why did we get this result? We noticed that their sticky ends are TGAC, GCTC, and ACTC. That is to say, GCTC and ACTC might have connected with each other by mistake. In another word, if the sticky ends are very similar, they probably connect with each other. Although we failed again, the result gave us confidence to improve 2012 Freiburg's parts.  </p>
 +
 
 +
<h2>How do we connect certain monomer? </h2>
 +
<h3>Some advanced tips for TALE protein</h3>
 +
 
 +
<p>1. Given a sample sequence with repeating amino acids:</p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/f/f2/SJTU14_tal_improvment_1.png"></img></center>
 +
 
 +
<center><small><strong> Figure 1.4.2 TALE amino acid sequence</strong></small></center>
 +
<p>What XX means is that it determines the certain kind of base. For one unit of repetition, other amino acids can be identical.</p>
 +
<p>2. A fully functional TALE protein contains a segment of sequence before repetitive units, recognizing first base T. It also contains a similar segment sequence but it is only half length as the repetitive unit which can recognize the last base T. </p>
 +
<p>3. The length that can be recognized is not strictly twelve or fourteen. According to the published results, the length of recognition sequence are dependent on the number of monomer.</p><br>
 +
 
 +
<p>We can gather 96 bioparts based on Freiburg, and each part has its counterproductive sticky ends base on certain location(1,2,3,4,5 or 6). By picking every two bases on certain location, we are able to design one TALE protein sequence.</p>
 +
 
 +
<h3>Previous Review: Freiburg’s way of connection</h3>
 +
 
 +
<p>The main principle of connection is built upon the idea of Golden Gate Connection.( Sanjana, N. E. et al. A transcription activator-like effector toolbox for genome engineering. Nature Protocols 7, 171–192 (2012).)</p>
 +
<p>The key point of their procedures is a type II Restriction Enzyme, BsmBI enzyme.</p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/5/5b/SJTU14_tal_improvment_2.png" ></img></center>
 +
 
 +
<center><small><strong> Figure 1.4.3 BsmBI recognition site</strong></small></center>
 +
<p>The main feature of this enzyme is that the recognition sequence is on only one side of cleavage site. It provides the way which can be used to get certain sticky ends with out breaking the whole sequence. The sticky end has 4bp, and it could be designed for combination of multiple sticky ends. That feature is excellent at first, but we cannot ignore its latent shortcomings. </p>
 +
<p>Let’s analyze the example (AA1) provided by Freiburg. </p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/b/bc/SJTU14_tal_improvment_3.png" width=700px></img></center>
 +
 
 +
<center><small><strong> Figure 1.4.4 DNA sequence of AA1 part</strong></small></center>
 +
<p><small><i>The underlined sequence is recognized by BsmBI. Vertical bar(|) is the cutting position. As for this sample, TGAC is one sticky end which can combine with six other sticky ends.</i></small></p>
 +
<h3>Evaluate seven sticky ends designed by 2012 Freiburg</h3>
 +
<p>2012 Freiburg's parts have seven sticky ends:</p>
 +
<p><center>TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC</center></p>
 +
<p>We all know that certain two parts can combine together, under the principle of complementary base pairing. However, is it possible that not totally matched sticky ends can bind together? We found, in fact, the more similar they are, the more possibility that they can form new but incorrect base pairs.
 +
Inspired by BLAST algorithm, we evaluated the similarity of every pair of sticky ends. </p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/8/8e/TAL%E7%B2%98%E6%80%A7%E6%9C%AB%E7%AB%AF%E8%A1%A8%E6%A0%BC.png" width=400px></img></center>
 +
<center><small><strong> Figure 1.4.5 Strict rules score table </strong></small></center>
 +
<p id="dianweidian5">The higher score, the higher similarity, and the higher possibility of mismatch.
 +
The table shows that more than 30% of pairs’ score is equal to 3, which means that the possibility of mismatch cannot be neglected.
 +
 
 +
Even if we employ the relatively loose rule to evaluate the similarity, we can still find that error rates cannot be neglected.</p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/9/9b/Tal_%E8%A1%A8%E6%A0%BC%E7%B2%98%E6%80%A7%E6%9C%AB%E7%AB%AF2.png" width=400px></img></center>
 +
<center><small><strong>Figure 1.4.6 Loose rules score table </strong></small></center>
 +
<h2> Why not other sticky ends?</h2>
 +
<h3>The reason why Freiburg used these sticky ends</h3>
 +
<p>Failed to contact the original designers of these sticky ends, what we can do is just to find feasible advantages of these combinations.</p>
 +
<p>Let's look at the TALE direpeat unit amino acids sequence:</p>
 +
<p><center>LTPEQVVAIAS(XX)GGKQALETVQRLLPVLCQAHG(34aa)</center></p>
 +
<p id="dianweidian6">The first amino acid is Leu, which is essential for all connection process. There are six different codons for Leu. </p>
 +
<p><center>UUA,UUG,CUU,CUC,CUA,CUG</center></p>
-
<ol><li> Freiburg's protocol</li>
+
<p>The 2012 Freiburg project's sticky ends:</p>
-
<li> Restriction enzyme digestion of plasmid and TAL repeats and gel extraction respectively. By the mole ratio, plasmid to TALE is 1 to 5 and TALE to TALE is 1 to 1. Ligation with T4 ligase in 22 ℃over night.</li>
+
<p><center>(C)TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC</center></p>
-
<li>The same ratio of plasmid and TALE repeat, but add the TALE repeats one by one and ligation in 22 ℃, 30 minutes</li>
+
<p>The feature of Degeneracy has helped to design seven sticky ends. However, since the codons for identical amino acid are highly similar,
-
<li>Every two parts connect at one time, and try to make three intermediates of 400bp, and then mix the plasmid to make the complete TALE.</li>
+
this feature, for experimental scientists, is a double-edged sword.</p>
-
<li>The same ratio and ligation with the program of 22℃ 2min, 40℃ 30,25 repeats.</li></ol>
+
<h2>How to improve the Golden Gate sticky ends? A big Table!</h2>
-
<p><center><strong><span style="font-size:22px;color:blue">The motivation to debug 2012 Freiburg’s parts</span></strong></center></p>
+
 +
<p>Three key questions need to be answered:</p>
 +
<p>1. Is it possible to find perfect match pair?</p>
 +
<p>2. Can we find a certain number of sticky ends with least mismatch possibility?</p>
 +
<p>3. How to make this sticky-end score table?</p>
-
<p>Unfortunately, all of our attempts failed. We didn’t manage to make a complete TALE, or even make two of them together. However, what is important for us is that when we try the 5th protocol, we notice an unexpected result. When we analyze the sequence result, we find that our left adaptor, 1st part and right adaptor connect together. Why do we get this result? We notice that their sticky end is TGAC, GCTC, and ACTC. That is to say, GCTC and ACTC connect with each other by mistake. In another word, if the sticky ends are very similar, they probably connect with each other. Although we failed again, the result gives us confidence to debug 2012 Freiburg's parts. </p>
+
<h3>Key algorithms derived from BLAST algorithm</h3>
-
<h2>How do we connect certain monomer? </h2>
+
<p>Loose rule: Match: 1; Mismatch: 0; Gap: 0</p>
-
<p><center><strong><span style="font-size:22px;color:blue">Some advanced tips for TALE protein</span></strong></center></p>
+
<p>Strict rule: Match: 1; Mismatch: 0; Gap: 1</p>
 +
<p>The sticky end is composed of four bases, which means that we can design 256 types of sticky ends, which are represented as a 256*256 table. </p>
 +
<h3>Find target groups of sticky ends</h3>
 +
<p>To solve the TALE parts problem, we need find seven sticky ends, and the similarity score of each pair in them are less than or equal to 1.</p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/1/12/Choices_for_group.png" width=400px></img></center>
 +
<center><small><strong>Figure 1.4.7 Sticky ends choices table </strong></small></center>
 +
<p>When we select Strict Algorithm to find these ends, it is impossible to find seven sticky ends, that each pair of them has score no more than 1. So we have to select Loose Algorithm.</p>
 +
<h3 > Convert four-basepair sticky ends to amino acid pairs</h3>
 +
<p>We care about whether two amino acids located on our target sequence, rather than the 4bp. So we should convert the sticky ends sequence to amino acid pairs.</p>
 +
<center><img src="https://static.igem.org/mediawiki/2014/4/45/Amino_acid_table.png"width=600px  id="dianweidian7"></img></center>
-
<ol>
+
<center><small><strong>Figure 1.4.8 4bp sticky ends convert to Amino acid table </strong></small></center>
-
<li>Given a sample sequence with repeating amino acids:
+
<p>Based on the above table, we are able to calculate the total scores of each combination and find the best one.</p>
-
<br/>
+
<h2>Best choice for seven sticky ends on TALE protein</h2>
-
<img src="https://static.igem.org/mediawiki/2014/f/f2/SJTU14_tal_improvment_1.png"></img>
+
<p>Best combination:
-
<p>What XX means is that it determine the certain kind of base. For one unit of repetition, other amino acids can be identical.</p></li>
+
<center>AAAA, AGGG, GTAC, GCTC, TTTT, TCGA, CCCC</center></p>
-
<li>A fully functional TALE protein contains one sequence, that does not have repetitive units, recognizing base T, and similar sequence but is only half length as its end. That is, one complete TALE protein is able to recognize certain number of repetitive units and two bases.</li>
+
<p>Scores Table(Loose rule):</p>
-
<li>The length that can be recognized is not strictly twelve or fourteen. According to the published results, the length and certain sequence are dependent on number and type of monomer.</li>
+
<center><img src="https://static.igem.org/mediawiki/2014/e/e0/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7_2014-10-16_%E4%B8%8B%E5%8D%883.24.40.png" width=400px></img></center>
-
</ol>
+
-
<p>We can gather 96 bioparts based on Freiburg, and each part has its counterproductive base on certain location(1,2,3,4,5 or 6). By picking two bases on certain location, we are able to design one TALE protein sequence.</p>
+
-
<br/>
+
-
<p><center><strong><span style="font-size:22px;color:blue">Previous Review: Freiburg’s way of connection</span></strong></center></p>
+
-
<p>The main principles of connection is built upon the idea of Golden Gate Connection.( Sanjana, N. E. et al. A transcription activator-like effector toolbox for genome engineering. Nature Protocols 7, 171–192 (2012).)</p>
+
<center><small><strong>Figure 1.4.9 Best combination score table </strong></small></center>
-
<p>The gust of these procedures is more related to one type of restriction enzyme, type II Restriction Enzyme, especially BsmBI enzyme.</p>
+
-
<center><img src="https://static.igem.org/mediawiki/2014/5/5b/SJTU14_tal_improvment_2.png" ></img></center>
+
-
<p>The main feature of this enzyme is the recognition sequence is on only one side of cleavage site. It provides the way which can be used to get certain incision without damaging the whole sequence. The sticky end has 4bp base, and it could be designed even for combination of multiple sticky end. That feature is fancy at first, but we cannot regardless its latent shortcomings. </p>
+
-
<p>Let’s analyze the example (AA1) provided by Freiburg. </p>
+
-
<center><img src="https://static.igem.org/mediawiki/2014/b/bc/SJTU14_tal_improvment_3.png" width=400px></img><center>
+
-
<p><small><i>The underlined parts are recognized by BsmBI. Vertical bar(|) is the cutting position. As for this sample, TGAC is one sticky end which can combine with other seven sticky ends.</i></small></p>
+
-
<p><center><strong><span style="font-size:22px;color:blue">Evaluate seven sticky ends designed by 2012 Freiburg</span></strong></center></p>
+
-
<p>2012 Freiburg's parts have seven sticky ends:</p>
+
-
<p><center>TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC<center></p>
+
-
<p>We all know that certain two parts can combine together, under base-pair rule. However, whether it is possible that unpaired sticky ends can bind together? In fact, the more similar they are, the more possibility that can form new but error base pairs.
+
-
Spired by BLAST algorithm, we calculate the similarity of each other sticky ends. </p>
+
-
<center><img src="https://static.igem.org/mediawiki/2014/8/8e/TAL%E7%B2%98%E6%80%A7%E6%9C%AB%E7%AB%AF%E8%A1%A8%E6%A0%BC.png" width=400px></img><center>
+
-
<p>The higher score, the higher similarity, and the higher possibility of mismatch.
+
-
The table shows that more than 30% of pairs’ score is equal to 3, which means that the possibility of mismatch cannot be neglected.
+
-
Even if we employ the relatively loose rule to calculate the similarity, we can still find that error rates cannot be neglected.</p>
+
<p>Position in TALE amino acids sequence:</p>
-
<center><img src="https://static.igem.org/mediawiki/2014/9/9b/Tal_%E8%A1%A8%E6%A0%BC%E7%B2%98%E6%80%A7%E6%9C%AB%E7%AB%AF2.png" width=400px></img><center>
+
<center><img src="https://static.igem.org/mediawiki/2014/e/e1/Sticky3333.png" width="500px" id="dianweidian8"></img></center>
 +
-
<h2> Why not other sticky ends?</h2>
+
<center><small><strong>Figure 1.4.10 Position in TALE amino acids sequence </strong></small></center>
-
<p><center><strong><span style="font-size:22px;color:blue">The Reason why Freiburg used these sticky ends</span></strong></center></p>
+
-
<p>Failed to contact the original designers of these sticky ends, what we can do is just to find feasible advantages of these combinations.</p>
+
-
<p>Review the TALE repeated amino acids sequence:</p>
+
-
<p><center>LTPEQVVAIAS(XX)GGKQALETVQRLLPVLCQAHG(34aa)</center></p>
+
-
<p>The first amino acid is Leu, which is essential for all connection process. There are six different types of base arrangement for Leu, one of the most number of base arrangement. </p>
+
-
<p><center>UUA,UUG,CUU,CUC,CUA,CUG</center></p>
+
-
<p>The counterproductive sticky ends:</p>
+
<h2>Reconstruct DNA sequences</h2>
-
<p><center>(C)TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC</center></p>
+
-
<p>The useless of Degeneracy has helped to design seven sticky ends. However, since the codons for identical amino acid are highly similar.
+
-
This feature, for experimental scientists, is a double-edged sword.</p>
+
-
<h2>How to improve the Golden Gate sticky ends? A big Table!</h2>
+
-
<br>
+
-
<p>Three basic key questions need to be answered:</p>
+
-
<ol><li>Whether it’s possible to find perfect match pair?</li>
+
-
<li>Whether we can find a certain number of sticky ends with least possibility to be mismatched?</li>
+
-
<li>How to make this sticky-end score table?</li></ol>
+
-
<p><center><strong><span style="font-size:22px;color:blue">Key algorithms derived from BLAST algorithm</span></strong></center></p>
+
<p>Two main factors to reconstruct DNA sequences:<br>
-
<p>Loose rule: Match: 1; Mismatch:-1; Gap: 0</p>
+
1.Use the table of best combination and rearrange the sticky ends according to your demand.<br>
-
<p>Strict rule: Match: 1; Mismatch:0; Gap: -1</p>
+
2.There are no BsmBI recognition sequence in the reconstruct DNA sequence.<br>
-
<p>The sticky end is composed of four bases, which means that we can design 256 types of sticky ends at most.
+
Final DNA Sequence for NEW TALE protein:</p>
-
The forming pair is represented as a 256*256 table.  </p>
+
<pre>
-
<p><center><strong><span style="font-size:22px;color:blue">Find target groups of sticky ends</span></strong></center></p>
+
1        CTGACCCCGG AACAGGTGGT GGCCATTGCA AGCAACGGTG GTGGCAAGCA GGCCCTGGAG
-
<p>To solve the TALE parts problem, we need find seven sticky ends, and the similarity score(hereafter referred to as Score) of each pair of them are less than or equal to 1.</p>
+
61      ACAGTCCAAC GGCTGCTTCC GGTTCTGTGT CAGGCCCACG GCCTGACTCC AGAACAAGTG
-
<center><img src="https://static.igem.org/mediawiki/2014/1/12/Choices_for_group.png" width=400px></img><center>
+
121      GTTGCTATCG CCAGCCACGA TGGCGGAAAA CAAGCCCTCG AAACCGTGCA GCGCCTGCTT
-
<p>When we select Strict Algorithm to find these ends, it is impossible to find seven sticky ends, that each pair of them has score no more than 1. So we have to select Loose Algorithm.</p>
+
181      CCGGTGCTGT GTCAGGCCCA CGGGCTCACC CCGGAACAGG TGGTGGCCAT CGCATCTAAC
-
<p><center><strong><span style="font-size:22px;color:blue">Four basepair sticky ends convert to amino acid pair</span></strong></center></p>
+
241      AATGGCGGTA AGCAGGCACT GGAAACAGTG CAGCGCCTGC TTCCGGTCCT GTGTCAGGCT
-
<p>What we are caring about is whether two amino acids can be located on my target sequence, rather than the 4bp bases. So we should convert the sticky ends information to 2 amino acids.</p>
+
301      CATGGCCTGA CCCCAGAGCA GGTCGTGGCA ATTGCCTCCA ACATTGGAGG GAAGCAGGCA
-
<center><img src="https://static.igem.org/mediawiki/2014/4/45/Amino_acid_table.png"width=600px></img></center>
+
361      CTGGAGACCG TGCAGCGGCT GCTGCCGGTG CTGTGTCAGG CCCACGGCTT GACCCCGGAA
-
<p>Based on the above table, we are able to calculate the total scores of each combination and find the least one.</p>
+
421      CAGGTGGTGG CCATCGCCTC CAACGGCGGT GGCAAACAGG CGCTGGAAAC AGTTCAACGC
-
<p><center><b><span style="font-size:22px;color:blue">Best choice for seven sticky ends on TALE protein</span></b></center></p>
+
481      CTCCTTCCGG TCCTGTGCCA GGCCCATGGT CTGACTCCAG AGCAGGTTGT GGCAATTGCA
-
<p>Best combination:<br>
+
541      AGCAACATTG GTGGTAAACA AGCTTTGGAA ACCGTCCAGC GCTTGCTGCC AGTACTGTGT
-
<center>AAAA, AGGG, GTAC, GCTC, TTTT, TCGA, CCCC</center></p>
+
601      CAGGCCCACG GGCTTACCCC GGAACAGGTG GTGGCCATTG CAAGCAACGG TGGTGGCAAG
-
<p>Scores Table(Loose rule):</p>
+
661      CAGGCCCTGG AGACAGTCCA ACGGCTGCTT CCGGTTCTGT GTCAGGCCCA CGGCCTGACT
-
<center><img src="https://static.igem.org/mediawiki/2014/e/e0/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7_2014-10-16_%E4%B8%8B%E5%8D%883.24.40.png" width=400px></img></center>
+
721      CCAGAACAAG TGGTTGCTAT CGCCAGCCAC GATGGCGGTA AACAAGCCCT CGAAACCGTG
-
<p>Position in TALE amino acids sequence:</p>
+
781      CAGCGCCTGC TTCCGGTGCT CTGTCAGGCC CACGGACTGA CCCCGGAACA GGTGGTGGCC
-
<center><img src="https://static.igem.org/mediawiki/2014/e/e1/Sticky3333.png" width=500px"></img></center>
+
841      ATCGCCTCCA ACATTGGTGG TAAGCAAGCC CTCGAAACTG TGCAGCGGCT GCTTCCAGTC
-
<p><center><b><span style="font-size:22px;color:blue">Reconstruct DNA Sequence</span></b></center></p>
+
901      TTGTGCCAGG CTCACGGCCT GACACCGGAG CAGGTGGTTG CAATCGCGTC TAATATCGGC
-
<p>Two main factors to reconstruct DNA sequence:<br>
+
961      GGCAAACAGG CACTCGAGAC CGTGCAGCGC TTGCTTCCAG TGCTGTGTCA GGCCCACGGC
-
1.Use the table of best combination and rearrange the sticky ends with your demand.<br>
+
1021    CTGACCCCGG AACAGGTGGT GGCCATCGCC TCTAACAATG GCGGCAAACA GGCATTGGAA
-
2.No BsmBI recognition sequence in the reconstruct DNA sequence.<br>
+
1081    ACAGTTCAGC GCCTGCTGCC GGTGTTGTGT CAGGCTCACG GCCTGACTCC GGAGCAGGTT
-
Final DNA Sequence for TALE protein:</p>
+
1141    GTGGCCATCG CAAGCCATGA TGGCGGTAAA CAAGCTCTGG AGACAGTGCA ACGCCTCTTG
-
<pre>
+
1201    CCAGTTTTGT GTCAGGCCCA CGGA                                       
-
1        CTGACCCCGG AACAGGTGGT GGCCATTGCA AGCAACGGTG GTGGCAAGCA GGCCCTGGAG
+
</pre>
-
61      ACAGTCCAAC GGCTGCTTCC GGTTCTGTGT CAGGCCCACG GCCTGACTCC AGAACAAGTG
+
<p>Final Amino acids remain the same:</p>
-
121      GTTGCTATCG CCAGCCACGA TGGCGGAAAA CAAGCCCTCG AAACCGTGCA GCGCCTGCTT
+
<pre>
-
181      CCGGTGCTGT GTCAGGCCCA CGGGCTCACC CCGGAACAGG TGGTGGCCAT CGCATCTAAC
+
1        LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
-
241      AATGGCGGTA AGCAGGCACT GGAAACAGTG CAGCGCCTGC TTCCGGTCCT GTGTCAGGCT
+
35        LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
-
301      CATGGCCTGA CCCCAGAGCA GGTCGTGGCA ATTGCCTCCA ACATTGGAGG GAAGCAGGCA
+
69        LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
-
361      CTGGAGACCG TGCAGCGGCT GCTGCCGGTG CTGTGTCAGG CCCACGGCTT GACCCCGGAA
+
103      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
-
421      CAGGTGGTGG CCATCGCCTC CAACGGCGGT GGCAAACAGG CGCTGGAAAC AGTTCAACGC
+
137      LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
-
481      CTCCTTCCGG TCCTGTGCCA GGCCCATGGT CTGACTCCAG AGCAGGTTGT GGCAATTGCA
+
171      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
-
541      AGCAACATTG GTGGTAAACA AGCTTTGGAA ACCGTCCAGC GCTTGCTGCC AGTACTGTGT
+
205      LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
-
601      CAGGCCCACG GGCTTACCCC GGAACAGGTG GTGGCCATTG CAAGCAACGG TGGTGGCAAG
+
239      LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
-
661      CAGGCCCTGG AGACAGTCCA ACGGCTGCTT CCGGTTCTGT GTCAGGCCCA CGGCCTGACT
+
273      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
-
721      CCAGAACAAG TGGTTGCTAT CGCCAGCCAC GATGGCGGTA AACAAGCCCT CGAAACCGTG
+
307      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
-
781      CAGCGCCTGC TTCCGGTGCT CTGTCAGGCC CACGGACTGA CCCCGGAACA GGTGGTGGCC
+
341      LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
-
841      ATCGCCTCCA ACATTGGTGG TAAGCAAGCC CTCGAAACTG TGCAGCGGCT GCTTCCAGTC
+
375      LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
-
901      TTGTGCCAGG CTCACGGCCT GACACCGGAG CAGGTGGTTG CAATCGCGTC TAATATCGGC
+
</pre>
-
961      GGCAAACAGG CACTCGAGAC CGTGCAGCGC TTGCTTCCAG TGCTGTGTCA GGCCCACGGC
+
<p>Corresponding part:</p>
-
1021    CTGACCCCGG AACAGGTGGT GGCCATCGCC TCTAACAATG GCGGCAAACA GGCATTGGAA
+
-
1081    ACAGTTCAGC GCCTGCTGCC GGTGTTGTGT CAGGCTCACG GCCTGACTCC GGAGCAGGTT
+
-
1141    GTGGCCATCG CAAGCCATGA TGGCGGTAAA CAAGCTCTGG AGACAGTGCA ACGCCTCTTG
+
-
1201    CCAGTTTTGT GTCAGGCCCA CGGA                                       
+
-
</pre>
+
-
<p>Final Amino acids remain the same:</p>
+
-
<pre>
+
-
1        LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
+
-
35        LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
+
-
69        LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
+
-
103      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
+
-
137      LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
+
-
171      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
+
-
205      LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
+
-
239      LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
+
-
273      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
+
-
307      LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
+
-
341      LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
+
-
375      LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
+
-
</pre>
+
-
<p>Corresponding part:</p>
+
-
<p >
+
<p >
-
<b>PART-left:</b><br>
+
<b>PART-left:</b><br>
-
…CTGACCCCGGAGACG
+
…CTGACCCCGGAGACG
-
</p>
+
</p>
-
<center><p>
+
<p>
-
<b>PART1(150bp):</b><br>
+
<b>PART1(150bp):</b><br>
-
CGTCTCGCCCCGGAACAGGTGGTGGCCATTGCAAGCAACGGTGGTGGCAAGCAGG
+
CGTCTCGCCCCGGAACAGGTGGTGGCCATTGCAAGCAACGGTGGTGGCAAGCAGG
-
CCCTGGAGACAGTCCAACGGCTGCTTCCGGTTCTGTGTCAGGCCCACGGCCTGACT<br>
+
CCCTGGAGACAGTCCAACGGCTGCTTCCGGTTCTGTGTCAGGCCCACGGCCTGACT
-
CCAGAACAAGTGGTTGCTATCGTGGCGGAAAATGAGACG</p></center>
+
CCAGAACAAGTGGTTGCTATCGTGGCGGAAAATGAGACG</p>
-
<center><p>
+
<p>
-
<b>PART2(219bp):</b><br>
+
<b>PART2(219bp):</b><br>
-
CGTCTCTAAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGTGTCAG
+
CGTCTCTAAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGTGTCAG
-
GCCCACGGGCTCACCCCGGAACAGGTGGTGGCCATCGCATCTAACAATGGCGGTA
+
GCCCACGGGCTCACCCCGGAACAGGTGGTGGCCATCGCATCTAACAATGGCGGTA
-
AGCAGGCACTGGAAACAGTGCAGCGCCTGCTTCCGGTCCTGTGTCAGGCTCATGG<br>
+
AGCAGGCACTGGAAACAGTGCAGCGCCTGCTTCCGGTCCTGTGTCAGGCTCATGG
-
CCTGACCCCAGAGCAGGTCGTGGCAATTGCCTCCAACATTGGAGGGCGAGACG</p></center>
+
CCTGACCCCAGAGCAGGTCGTGGCAATTGCCTCCAACATTGGAGGGCGAGACG</p>
-
<center><p>
+
<p>
-
<b>PART3(262bp):</b><br>
+
<b>PART3(262bp):</b><br>
-
CGTCTCTAGGGAAGCAGGCACTGGAGACCGTGCAGCGGCTGCTGCCGGTGCTGTG
+
CGTCTCTAGGGAAGCAGGCACTGGAGACCGTGCAGCGGCTGCTGCCGGTGCTGTG
-
TCAGGCCCACGGCTTGACCCCGGAACAGGTGGTGGCCATCGCCTCCAACGGCGGT
+
TCAGGCCCACGGCTTGACCCCGGAACAGGTGGTGGCCATCGCCTCCAACGGCGGT
-
GGCAAACAGGCGCTGGAAACAGTTCAACGCCTCCTTCCGGTCCTGTGCCAGGCCC
+
GGCAAACAGGCGCTGGAAACAGTTCAACGCCTCCTTCCGGTCCTGTGCCAGGCCC
-
ATGGTCTGACTCCAGAGCAGGTTGTGGCAATTGCAAGCAACATTGGTGGTAAACA<br>
+
ATGGTCTGACTCCAGAGCAGGTTGTGGCAATTGCAAGCAACATTGGTGGTAAACA
-
AGCTTTGGAAACCGTCCAGCGCTTGCTGCCAGTACGGAGACG</p></center>
+
AGCTTTGGAAACCGTCCAGCGCTTGCTGCCAGTACGGAGACG</p></center>
-
<center><p>
+
<p>
-
<b>PART4(224bp):</b><br>
+
<b>PART4(224bp):</b><br>
-
CGTCTCCGTACTGTGTCAGGCCCACGGGCTTACCCCGGAACAGGTGGTGGCCATT
+
CGTCTCCGTACTGTGTCAGGCCCACGGGCTTACCCCGGAACAGGTGGTGGCCATT
-
GCAAGCAACGGTGGTGGCAAGCAGGCCCTGGAGACAGTCCAACGGCTGCTTCCGG
+
GCAAGCAACGGTGGTGGCAAGCAGGCCCTGGAGACAGTCCAACGGCTGCTTCCGG
-
TTCTGTGTCAGGCCCACGGCCTGACTCCAGAACAAGTGGTTGCTATCGCCAGCCA
+
TTCTGTGTCAGGCCCACGGCCTGACTCCAGAACAAGTGGTTGCTATCGCCAGCCA
-
CGATGGCGGTAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGGGA<br>
+
CGATGGCGGTAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGGGA<br>
-
GACG
+
GACG
-
</p></center>
+
</p>
-
<center><p>
+
<p>
-
<b>PART5(194bp):</b><br>
+
<b>PART5(194bp):</b><br>
-
CGTCTCCGCTGTGTCAGGCCCACGGACTGACCCCGGAACAGGTGGTGGCCATCGC
+
CGTCTCCGCTGTGTCAGGCCCACGGACTGACCCCGGAACAGGTGGTGGCCATCGC
-
CTCCAACATTGGTGGTAAGCAAGCCCTCGAAACTGTGCAGCGGCTGCTTCCAGTC
+
CTCCAACATTGGTGGTAAGCAAGCCCTCGAAACTGTGCAGCGGCTGCTTCCAGTC
-
TTGTGCCAGGCTCACGGCCTGACACCGGAGCAGGTGGTTGCAATCGCGTCTAATA<br>
+
TTGTGCCAGGCTCACGGCCTGACACCGGAGCAGGTGGTTGCAATCGCGTCTAATA<br>
-
TCGGCGGCAAACAGGCACTCGATGAGACG
+
TCGGCGGCAAACAGGCACTCGATGAGACG
-
</p></center>
+
</p>
-
<center><p>
+
<p>
-
<b>PART6(249bp):</b><br>
+
<b>PART6(249bp):</b><br>
-
CGTCTCATCGAGACCGTGCAGCGCTTGCTTCCAGTGCTGTGTCAGGCCCACGGCC
+
CGTCTCATCGAGACCGTGCAGCGCTTGCTTCCAGTGCTGTGTCAGGCCCACGGCC
-
TGACCCCGGAACAGGTGGTGGCCATCGCCTCTAACAATGGCGGCAAACAGGCATT
+
TGACCCCGGAACAGGTGGTGGCCATCGCCTCTAACAATGGCGGCAAACAGGCATT
-
GGAAACAGTTCAGCGCCTGCTGCCGGTGTTGTGTCAGGCTCACGGCCTGACTCCG
+
GGAAACAGTTCAGCGCCTGCTGCCGGTGTTGTGTCAGGCTCACGGCCTGACTCCG
-
GAGCAGGTTGTGGCCATCGCAAGCCATGATGGCGGTAAACAAGCTCTGGAGACAG<br>
+
GAGCAGGTTGTGGCCATCGCAAGCCATGATGGCGGTAAACAAGCTCTGGAGACAG<br>
-
TGCAACGCCTCTTGCCAGTTTTAGAGACG</p></center>
+
TGCAACGCCTCTTGCCAGTTTTAGAGACG</p>
-
<center><p>
+
<p>
-
<b>PART-right:</b><br>
+
<b>PART-right:</b><br>
-
CGTCTCATTTTGTGTCAGGCCCACGGA...</p></center>
+
CGTCTCATTTTGTGTCAGGCCCACGGA...</p><br>
-
<p>
+
<p>
-
The recognition sequence of the TALE protein:<br>
+
The recognition sequence of the TALE protein:
-
<center><font size="5" color="red">TCGATATCAAGC</font></center></p>
+
<center><font size="5" color="red">TTCGATATCAAGCT</font></center></p>
-
<p>All parts are under artificial synthesis process, so there is few results, which can prove our changes are useful. However, with the principle of complementary base pairing, our chioce should be better than original vision. And if you want our data or use our method to create your own best sticky ends, just contact us!</p>
+
<p>All parts are under artificial synthesis process, so there is few results to present, which can prove our changes are effective. However, with the principle of complementary base pairing, our choice should be better than original version. And if you want our data or use our method to create your own best sticky ends, just contact us!</p>

Latest revision as of 21:53, 17 October 2014

TAL Improvement


——More Effective Golden Gate Cloning

This year our team not only focused on our project and bioparts but also improved several existing parts designed by iGEM12_Freiburg. When we used their parts and followed their protocols, we found that it was hard for us to repeat their success. There were a lot of difficulties on our way to constructing a TALE. After discussion and thinking, our team decided to improve the existing TAL parts by finding out new and better sticky ends. Moreover, our method can be used in border ways when people want to connect some components by Golden Gate method.

Why we want to improve it?

Whether the Freiburg's design is efficient or not

According to the experimental record of Freiburg, the success rate is higher than 95%(32/33). However, this result, to some degree, lacks statistical significance.

In the result section, they emphasize that there is a light band at 1200bp, which they believe could indicate that the Golden Gate connection works well. However, after conducting several experiments by ourselves, we found that the key point to indicate whether Golden Gate connection works is not the band at 1200bp. If the band is not clear and specific in the gel, it indicates the experiment doesn’t go well. We can easily find several light bands under the band of 1200bp. Moreover, the second light band is somewhat lighter than the band at 1200bp. Although the Freiburg can explain the results with the repeatability of the TALE sequence, we suppose that the possibility of the mismatch of the sticky ends still can’t be excluded. Frankly speaking, we try to believe that they really made it, but if the success cannot be repeated, there must be something wrong with their system. You can view Detail information in iGEM2012 Freiburg wiki.

Freiburg gel result
Figure 1.4.1 2012 Freiburg gel result

The protocols we took to connect the parts of TALE

1. Freiburg's protocol

2. Restriction enzyme digestion of plasmid and TAL direpeats, followed by gel extraction. By the mole ratio, plasmid to TALE is 1 to 5 and TALE to TALE is 1 to 1. Ligation with T4 ligase in 22 ℃ over night.

3. With the same ratio of plasmid and TALE direpeats, add the TALE direpeats one by one and connect them in 22 ℃, 30 minutes.

4. Every two parts connect at one time, and try to make three intermediates of 400bp. And then mix the plasmids to make the complete TALE.

5. The same ratio and connect following the program of 22℃ 2min, 40℃ 30s, 25 repeats.

The motivation to improve 2012 Freiburg’s parts

Unfortunately, all of our attempts failed. We didn’t manage to make a complete TALE, or even make two of them together. However, what is important for us is that when we tried the 5th protocol, we noticed an unexpected result. When we analyzed the sequence result, we found that our left adaptor, 1st part and right adaptor connected together. Why did we get this result? We noticed that their sticky ends are TGAC, GCTC, and ACTC. That is to say, GCTC and ACTC might have connected with each other by mistake. In another word, if the sticky ends are very similar, they probably connect with each other. Although we failed again, the result gave us confidence to improve 2012 Freiburg's parts.

How do we connect certain monomer?

Some advanced tips for TALE protein

1. Given a sample sequence with repeating amino acids:

Figure 1.4.2 TALE amino acid sequence

What XX means is that it determines the certain kind of base. For one unit of repetition, other amino acids can be identical.

2. A fully functional TALE protein contains a segment of sequence before repetitive units, recognizing first base T. It also contains a similar segment sequence but it is only half length as the repetitive unit which can recognize the last base T.

3. The length that can be recognized is not strictly twelve or fourteen. According to the published results, the length of recognition sequence are dependent on the number of monomer.


We can gather 96 bioparts based on Freiburg, and each part has its counterproductive sticky ends base on certain location(1,2,3,4,5 or 6). By picking every two bases on certain location, we are able to design one TALE protein sequence.

Previous Review: Freiburg’s way of connection

The main principle of connection is built upon the idea of Golden Gate Connection.( Sanjana, N. E. et al. A transcription activator-like effector toolbox for genome engineering. Nature Protocols 7, 171–192 (2012).)

The key point of their procedures is a type II Restriction Enzyme, BsmBI enzyme.

Figure 1.4.3 BsmBI recognition site

The main feature of this enzyme is that the recognition sequence is on only one side of cleavage site. It provides the way which can be used to get certain sticky ends with out breaking the whole sequence. The sticky end has 4bp, and it could be designed for combination of multiple sticky ends. That feature is excellent at first, but we cannot ignore its latent shortcomings.

Let’s analyze the example (AA1) provided by Freiburg.

Figure 1.4.4 DNA sequence of AA1 part

The underlined sequence is recognized by BsmBI. Vertical bar(|) is the cutting position. As for this sample, TGAC is one sticky end which can combine with six other sticky ends.

Evaluate seven sticky ends designed by 2012 Freiburg

2012 Freiburg's parts have seven sticky ends:

TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC

We all know that certain two parts can combine together, under the principle of complementary base pairing. However, is it possible that not totally matched sticky ends can bind together? We found, in fact, the more similar they are, the more possibility that they can form new but incorrect base pairs. Inspired by BLAST algorithm, we evaluated the similarity of every pair of sticky ends.

Figure 1.4.5 Strict rules score table

The higher score, the higher similarity, and the higher possibility of mismatch. The table shows that more than 30% of pairs’ score is equal to 3, which means that the possibility of mismatch cannot be neglected. Even if we employ the relatively loose rule to evaluate the similarity, we can still find that error rates cannot be neglected.

Figure 1.4.6 Loose rules score table

Why not other sticky ends?

The reason why Freiburg used these sticky ends

Failed to contact the original designers of these sticky ends, what we can do is just to find feasible advantages of these combinations.

Let's look at the TALE direpeat unit amino acids sequence:

LTPEQVVAIAS(XX)GGKQALETVQRLLPVLCQAHG(34aa)

The first amino acid is Leu, which is essential for all connection process. There are six different codons for Leu.

UUA,UUG,CUU,CUC,CUA,CUG

The 2012 Freiburg project's sticky ends:

(C)TGAC,GCTC,CTTG,GCTT,ACTG,CCTG,ACTC

The feature of Degeneracy has helped to design seven sticky ends. However, since the codons for identical amino acid are highly similar, this feature, for experimental scientists, is a double-edged sword.

How to improve the Golden Gate sticky ends? A big Table!

Three key questions need to be answered:

1. Is it possible to find perfect match pair?

2. Can we find a certain number of sticky ends with least mismatch possibility?

3. How to make this sticky-end score table?

Key algorithms derived from BLAST algorithm

Loose rule: Match: 1; Mismatch: 0; Gap: 0

Strict rule: Match: 1; Mismatch: 0; Gap: 1

The sticky end is composed of four bases, which means that we can design 256 types of sticky ends, which are represented as a 256*256 table.

Find target groups of sticky ends

To solve the TALE parts problem, we need find seven sticky ends, and the similarity score of each pair in them are less than or equal to 1.

Figure 1.4.7 Sticky ends choices table

When we select Strict Algorithm to find these ends, it is impossible to find seven sticky ends, that each pair of them has score no more than 1. So we have to select Loose Algorithm.

Convert four-basepair sticky ends to amino acid pairs

We care about whether two amino acids located on our target sequence, rather than the 4bp. So we should convert the sticky ends sequence to amino acid pairs.

Figure 1.4.8 4bp sticky ends convert to Amino acid table

Based on the above table, we are able to calculate the total scores of each combination and find the best one.

Best choice for seven sticky ends on TALE protein

Best combination:

AAAA, AGGG, GTAC, GCTC, TTTT, TCGA, CCCC

Scores Table(Loose rule):

Figure 1.4.9 Best combination score table

Position in TALE amino acids sequence:

Figure 1.4.10 Position in TALE amino acids sequence

Reconstruct DNA sequences

Two main factors to reconstruct DNA sequences:
1.Use the table of best combination and rearrange the sticky ends according to your demand.
2.There are no BsmBI recognition sequence in the reconstruct DNA sequence.
Final DNA Sequence for NEW TALE protein:

	1        CTGACCCCGG AACAGGTGGT GGCCATTGCA AGCAACGGTG GTGGCAAGCA GGCCCTGGAG
	61       ACAGTCCAAC GGCTGCTTCC GGTTCTGTGT CAGGCCCACG GCCTGACTCC AGAACAAGTG
	121      GTTGCTATCG CCAGCCACGA TGGCGGAAAA CAAGCCCTCG AAACCGTGCA GCGCCTGCTT
	181      CCGGTGCTGT GTCAGGCCCA CGGGCTCACC CCGGAACAGG TGGTGGCCAT CGCATCTAAC
	241      AATGGCGGTA AGCAGGCACT GGAAACAGTG CAGCGCCTGC TTCCGGTCCT GTGTCAGGCT
	301      CATGGCCTGA CCCCAGAGCA GGTCGTGGCA ATTGCCTCCA ACATTGGAGG GAAGCAGGCA
	361      CTGGAGACCG TGCAGCGGCT GCTGCCGGTG CTGTGTCAGG CCCACGGCTT GACCCCGGAA
	421      CAGGTGGTGG CCATCGCCTC CAACGGCGGT GGCAAACAGG CGCTGGAAAC AGTTCAACGC
	481      CTCCTTCCGG TCCTGTGCCA GGCCCATGGT CTGACTCCAG AGCAGGTTGT GGCAATTGCA
	541      AGCAACATTG GTGGTAAACA AGCTTTGGAA ACCGTCCAGC GCTTGCTGCC AGTACTGTGT
	601      CAGGCCCACG GGCTTACCCC GGAACAGGTG GTGGCCATTG CAAGCAACGG TGGTGGCAAG
	661      CAGGCCCTGG AGACAGTCCA ACGGCTGCTT CCGGTTCTGT GTCAGGCCCA CGGCCTGACT
	721      CCAGAACAAG TGGTTGCTAT CGCCAGCCAC GATGGCGGTA AACAAGCCCT CGAAACCGTG
	781      CAGCGCCTGC TTCCGGTGCT CTGTCAGGCC CACGGACTGA CCCCGGAACA GGTGGTGGCC
	841      ATCGCCTCCA ACATTGGTGG TAAGCAAGCC CTCGAAACTG TGCAGCGGCT GCTTCCAGTC
	901      TTGTGCCAGG CTCACGGCCT GACACCGGAG CAGGTGGTTG CAATCGCGTC TAATATCGGC
	961      GGCAAACAGG CACTCGAGAC CGTGCAGCGC TTGCTTCCAG TGCTGTGTCA GGCCCACGGC
	1021     CTGACCCCGG AACAGGTGGT GGCCATCGCC TCTAACAATG GCGGCAAACA GGCATTGGAA
	1081     ACAGTTCAGC GCCTGCTGCC GGTGTTGTGT CAGGCTCACG GCCTGACTCC GGAGCAGGTT
	1141     GTGGCCATCG CAAGCCATGA TGGCGGTAAA CAAGCTCTGG AGACAGTGCA ACGCCTCTTG
	1201     CCAGTTTTGT GTCAGGCCCA CGGA                                       
	

Final Amino acids remain the same:

	1         LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
	35        LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
	69        LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
	103       LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
	137       LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
	171       LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
	205       LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG
	239       LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
	273       LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
	307       LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG
	341       LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG
	375       LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
	

Corresponding part:

PART-left:
…CTGACCCCGGAGACG

PART1(150bp):
CGTCTCGCCCCGGAACAGGTGGTGGCCATTGCAAGCAACGGTGGTGGCAAGCAGG CCCTGGAGACAGTCCAACGGCTGCTTCCGGTTCTGTGTCAGGCCCACGGCCTGACT CCAGAACAAGTGGTTGCTATCGTGGCGGAAAATGAGACG

PART2(219bp):
CGTCTCTAAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGTGTCAG GCCCACGGGCTCACCCCGGAACAGGTGGTGGCCATCGCATCTAACAATGGCGGTA AGCAGGCACTGGAAACAGTGCAGCGCCTGCTTCCGGTCCTGTGTCAGGCTCATGG CCTGACCCCAGAGCAGGTCGTGGCAATTGCCTCCAACATTGGAGGGCGAGACG

PART3(262bp):
CGTCTCTAGGGAAGCAGGCACTGGAGACCGTGCAGCGGCTGCTGCCGGTGCTGTG TCAGGCCCACGGCTTGACCCCGGAACAGGTGGTGGCCATCGCCTCCAACGGCGGT GGCAAACAGGCGCTGGAAACAGTTCAACGCCTCCTTCCGGTCCTGTGCCAGGCCC ATGGTCTGACTCCAGAGCAGGTTGTGGCAATTGCAAGCAACATTGGTGGTAAACA AGCTTTGGAAACCGTCCAGCGCTTGCTGCCAGTACGGAGACG

PART4(224bp):
CGTCTCCGTACTGTGTCAGGCCCACGGGCTTACCCCGGAACAGGTGGTGGCCATT GCAAGCAACGGTGGTGGCAAGCAGGCCCTGGAGACAGTCCAACGGCTGCTTCCGG TTCTGTGTCAGGCCCACGGCCTGACTCCAGAACAAGTGGTTGCTATCGCCAGCCA CGATGGCGGTAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGGGA
GACG

PART5(194bp):
CGTCTCCGCTGTGTCAGGCCCACGGACTGACCCCGGAACAGGTGGTGGCCATCGC CTCCAACATTGGTGGTAAGCAAGCCCTCGAAACTGTGCAGCGGCTGCTTCCAGTC TTGTGCCAGGCTCACGGCCTGACACCGGAGCAGGTGGTTGCAATCGCGTCTAATA
TCGGCGGCAAACAGGCACTCGATGAGACG

PART6(249bp):
CGTCTCATCGAGACCGTGCAGCGCTTGCTTCCAGTGCTGTGTCAGGCCCACGGCC TGACCCCGGAACAGGTGGTGGCCATCGCCTCTAACAATGGCGGCAAACAGGCATT GGAAACAGTTCAGCGCCTGCTGCCGGTGTTGTGTCAGGCTCACGGCCTGACTCCG GAGCAGGTTGTGGCCATCGCAAGCCATGATGGCGGTAAACAAGCTCTGGAGACAG
TGCAACGCCTCTTGCCAGTTTTAGAGACG

PART-right:
CGTCTCATTTTGTGTCAGGCCCACGGA...


The recognition sequence of the TALE protein:

TTCGATATCAAGCT

All parts are under artificial synthesis process, so there is few results to present, which can prove our changes are effective. However, with the principle of complementary base pairing, our choice should be better than original version. And if you want our data or use our method to create your own best sticky ends, just contact us!