Why we want to improve it?
Whether the Freiburg's design is efficient or not
According to the experimental record of Freiburg, the success rate is higher than 95%(32/33). However, this result, to some degree, lacks statistical significance.
In the result section, they emphasize that there is a light band at 1200bp, which they believe could indicate that the Golden Gate connection works well. However, after conducting several experiments by ourselves, we found that the key point to indicate whether Golden Gate connection works is not the band at 1200bp. If the band is not clear and specific in the gel, it indicates the experiment doesn’t go well. We can easily find several light bands under the band of 1200bp. Moreover, the second light band is somewhat lighter than the band at 1200bp. Although the Freiburg can explain the results with the repeatability of the TALE sequence, we suppose that the possibility of the mismatch of the sticky ends still can’t be excluded. Frankly speaking, we try to believe that they really made it, but if the success cannot be repeated, there must be something wrong with their system. You can view Detail information in iGEM2012 Freiburg wiki.
The protocols we took to connect the parts of TALE
1. Freiburg's protocol
2. Restriction enzyme digestion of plasmid and TAL direpeats, followed by gel extraction. By the mole ratio, plasmid to TALE is 1 to 5 and TALE to TALE is 1 to 1. Ligation with T4 ligase in 22 ℃ over night.
3. With the same ratio of plasmid and TALE direpeats, add the TALE direpeats one by one and connect them in 22 ℃, 30 minutes.
4. Every two parts connect at one time, and try to make three intermediates of 400bp. And then mix the plasmids to make the complete TALE.
5. The same ratio and connect following the program of 22℃ 2min, 40℃ 30s, 25 repeats.
The motivation to improve 2012 Freiburg’s parts
Unfortunately, all of our attempts failed. We didn’t manage to make a complete TALE, or even make two of them together. However, what is important for us is that when we tried the 5th protocol, we noticed an unexpected result. When we analyzed the sequence result, we found that our left adaptor, 1st part and right adaptor connected together. Why did we get this result? We noticed that their sticky ends are TGAC, GCTC, and ACTC. That is to say, GCTC and ACTC might have connected with each other by mistake. In another word, if the sticky ends are very similar, they probably connect with each other. Although we failed again, the result gave us confidence to improve 2012 Freiburg's parts.
How do we connect certain monomer?
Some advanced tips for TALE protein
1. Given a sample sequence with repeating amino acids:
What XX means is that it determines the certain kind of base. For one unit of repetition, other amino acids can be identical.
2. A fully functional TALE protein contains a segment of sequence before repetitive units, recognizing first base T. It also contains a similar segment sequence but it is only half length as the repetitive unit which can recognize the last base T.
3. The length that can be recognized is not strictly twelve or fourteen. According to the published results, the length of recognition sequence are dependent on the number of monomer.
We can gather 96 bioparts based on Freiburg, and each part has its counterproductive sticky ends base on certain location(1,2,3,4,5 or 6). By picking every two bases on certain location, we are able to design one TALE protein sequence.
Previous Review: Freiburg’s way of connection
The main principle of connection is built upon the idea of Golden Gate Connection.( Sanjana, N. E. et al. A transcription activator-like effector toolbox for genome engineering. Nature Protocols 7, 171–192 (2012).)
The key point of their procedures is a type II Restriction Enzyme, BsmBI enzyme.
The main feature of this enzyme is that the recognition sequence is on only one side of cleavage site. It provides the way which can be used to get certain sticky ends with out breaking the whole sequence. The sticky end has 4bp, and it could be designed for combination of multiple sticky ends. That feature is excellent at first, but we cannot ignore its latent shortcomings.
Let’s analyze the example (AA1) provided by Freiburg.
The underlined sequence is recognized by BsmBI. Vertical bar(|) is the cutting position. As for this sample, TGAC is one sticky end which can combine with six other sticky ends.
Evaluate seven sticky ends designed by 2012 Freiburg
2012 Freiburg's parts have seven sticky ends:
We all know that certain two parts can combine together, under the principle of complementary base pairing. However, is it possible that not totally matched sticky ends can bind together? We found, in fact, the more similar they are, the more possibility that they can form new but incorrect base pairs. Inspired by BLAST algorithm, we evaluated the similarity of every pair of sticky ends.
The higher score, the higher similarity, and the higher possibility of mismatch. The table shows that more than 30% of pairs’ score is equal to 3, which means that the possibility of mismatch cannot be neglected. Even if we employ the relatively loose rule to evaluate the similarity, we can still find that error rates cannot be neglected.
Why not other sticky ends?
The reason why Freiburg used these sticky ends
Failed to contact the original designers of these sticky ends, what we can do is just to find feasible advantages of these combinations.
Let's look at the TALE direpeat unit amino acids sequence:
The first amino acid is Leu, which is essential for all connection process. There are six different codons for Leu.
The 2012 Freiburg project's sticky ends:
The feature of Degeneracy has helped to design seven sticky ends. However, since the codons for identical amino acid are highly similar, this feature, for experimental scientists, is a double-edged sword.
How to improve the Golden Gate sticky ends? A big Table!
Three key questions need to be answered:
1. Is it possible to find perfect match pair?
2. Can we find a certain number of sticky ends with least mismatch possibility?
3. How to make this sticky-end score table?
Key algorithms derived from BLAST algorithm
Loose rule: Match: 1; Mismatch: 0; Gap: 0
Strict rule: Match: 1; Mismatch: 0; Gap: 1
The sticky end is composed of four bases, which means that we can design 256 types of sticky ends, which are represented as a 256*256 table.
Find target groups of sticky ends
To solve the TALE parts problem, we need find seven sticky ends, and the similarity score of each pair in them are less than or equal to 1.
When we select Strict Algorithm to find these ends, it is impossible to find seven sticky ends, that each pair of them has score no more than 1. So we have to select Loose Algorithm.
Convert four-basepair sticky ends to amino acid pairs
We care about whether two amino acids located on our target sequence, rather than the 4bp. So we should convert the sticky ends sequence to amino acid pairs.
Based on the above table, we are able to calculate the total scores of each combination and find the best one.
Best choice for seven sticky ends on TALE protein
Best combination:
Scores Table(Loose rule):
Position in TALE amino acids sequence:
Reconstruct DNA Sequence
Two main factors to reconstruct DNA sequence: 1.Use the table of best combination and rearrange the sticky ends with your demand. 2.No BsmBI recognition sequence in the reconstruct DNA sequence. Final DNA Sequence for TALE protein:
1 CTGACCCCGG AACAGGTGGT GGCCATTGCA AGCAACGGTG GTGGCAAGCA GGCCCTGGAG 61 ACAGTCCAAC GGCTGCTTCC GGTTCTGTGT CAGGCCCACG GCCTGACTCC AGAACAAGTG 121 GTTGCTATCG CCAGCCACGA TGGCGGAAAA CAAGCCCTCG AAACCGTGCA GCGCCTGCTT 181 CCGGTGCTGT GTCAGGCCCA CGGGCTCACC CCGGAACAGG TGGTGGCCAT CGCATCTAAC 241 AATGGCGGTA AGCAGGCACT GGAAACAGTG CAGCGCCTGC TTCCGGTCCT GTGTCAGGCT 301 CATGGCCTGA CCCCAGAGCA GGTCGTGGCA ATTGCCTCCA ACATTGGAGG GAAGCAGGCA 361 CTGGAGACCG TGCAGCGGCT GCTGCCGGTG CTGTGTCAGG CCCACGGCTT GACCCCGGAA 421 CAGGTGGTGG CCATCGCCTC CAACGGCGGT GGCAAACAGG CGCTGGAAAC AGTTCAACGC 481 CTCCTTCCGG TCCTGTGCCA GGCCCATGGT CTGACTCCAG AGCAGGTTGT GGCAATTGCA 541 AGCAACATTG GTGGTAAACA AGCTTTGGAA ACCGTCCAGC GCTTGCTGCC AGTACTGTGT 601 CAGGCCCACG GGCTTACCCC GGAACAGGTG GTGGCCATTG CAAGCAACGG TGGTGGCAAG 661 CAGGCCCTGG AGACAGTCCA ACGGCTGCTT CCGGTTCTGT GTCAGGCCCA CGGCCTGACT 721 CCAGAACAAG TGGTTGCTAT CGCCAGCCAC GATGGCGGTA AACAAGCCCT CGAAACCGTG 781 CAGCGCCTGC TTCCGGTGCT CTGTCAGGCC CACGGACTGA CCCCGGAACA GGTGGTGGCC 841 ATCGCCTCCA ACATTGGTGG TAAGCAAGCC CTCGAAACTG TGCAGCGGCT GCTTCCAGTC 901 TTGTGCCAGG CTCACGGCCT GACACCGGAG CAGGTGGTTG CAATCGCGTC TAATATCGGC 961 GGCAAACAGG CACTCGAGAC CGTGCAGCGC TTGCTTCCAG TGCTGTGTCA GGCCCACGGC 1021 CTGACCCCGG AACAGGTGGT GGCCATCGCC TCTAACAATG GCGGCAAACA GGCATTGGAA 1081 ACAGTTCAGC GCCTGCTGCC GGTGTTGTGT CAGGCTCACG GCCTGACTCC GGAGCAGGTT 1141 GTGGCCATCG CAAGCCATGA TGGCGGTAAA CAAGCTCTGG AGACAGTGCA ACGCCTCTTG 1201 CCAGTTTTGT GTCAGGCCCA CGGA
Final Amino acids remain the same:
1 LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG 35 LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG 69 LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG 103 LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG 137 LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG 171 LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG 205 LTPEQVVAIA SNGGGKQALE TVQRLLPVLC QAHG 239 LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG 273 LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG 307 LTPEQVVAIA SNIGGKQALE TVQRLLPVLC QAHG 341 LTPEQVVAIA SNNGGKQALE TVQRLLPVLC QAHG 375 LTPEQVVAIA SHDGGKQALE TVQRLLPVLC QAHG
Corresponding part:
PART-left:
…CTGACCCCGGAGACG
PART1(150bp):
CGTCTCGCCCCGGAACAGGTGGTGGCCATTGCAAGCAACGGTGGTGGCAAGCAGG
CCCTGGAGACAGTCCAACGGCTGCTTCCGGTTCTGTGTCAGGCCCACGGCCTGACT
CCAGAACAAGTGGTTGCTATCGTGGCGGAAAATGAGACG
PART2(219bp):
CGTCTCTAAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGTGTCAG
GCCCACGGGCTCACCCCGGAACAGGTGGTGGCCATCGCATCTAACAATGGCGGTA
AGCAGGCACTGGAAACAGTGCAGCGCCTGCTTCCGGTCCTGTGTCAGGCTCATGG
CCTGACCCCAGAGCAGGTCGTGGCAATTGCCTCCAACATTGGAGGGCGAGACG
PART3(262bp):
CGTCTCTAGGGAAGCAGGCACTGGAGACCGTGCAGCGGCTGCTGCCGGTGCTGTG
TCAGGCCCACGGCTTGACCCCGGAACAGGTGGTGGCCATCGCCTCCAACGGCGGT
GGCAAACAGGCGCTGGAAACAGTTCAACGCCTCCTTCCGGTCCTGTGCCAGGCCC
ATGGTCTGACTCCAGAGCAGGTTGTGGCAATTGCAAGCAACATTGGTGGTAAACA
AGCTTTGGAAACCGTCCAGCGCTTGCTGCCAGTACGGAGACG
PART4(224bp):
CGTCTCCGTACTGTGTCAGGCCCACGGGCTTACCCCGGAACAGGTGGTGGCCATT
GCAAGCAACGGTGGTGGCAAGCAGGCCCTGGAGACAGTCCAACGGCTGCTTCCGG
TTCTGTGTCAGGCCCACGGCCTGACTCCAGAACAAGTGGTTGCTATCGCCAGCCA
CGATGGCGGTAAACAAGCCCTCGAAACCGTGCAGCGCCTGCTTCCGGTGCTGGGA
GACG
PART5(194bp):
CGTCTCCGCTGTGTCAGGCCCACGGACTGACCCCGGAACAGGTGGTGGCCATCGC
CTCCAACATTGGTGGTAAGCAAGCCCTCGAAACTGTGCAGCGGCTGCTTCCAGTC
TTGTGCCAGGCTCACGGCCTGACACCGGAGCAGGTGGTTGCAATCGCGTCTAATA
TCGGCGGCAAACAGGCACTCGATGAGACG
PART6(249bp):
CGTCTCATCGAGACCGTGCAGCGCTTGCTTCCAGTGCTGTGTCAGGCCCACGGCC
TGACCCCGGAACAGGTGGTGGCCATCGCCTCTAACAATGGCGGCAAACAGGCATT
GGAAACAGTTCAGCGCCTGCTGCCGGTGTTGTGTCAGGCTCACGGCCTGACTCCG
GAGCAGGTTGTGGCCATCGCAAGCCATGATGGCGGTAAACAAGCTCTGGAGACAG
TGCAACGCCTCTTGCCAGTTTTAGAGACG
PART-right:
CGTCTCATTTTGTGTCAGGCCCACGGA...
The recognition sequence of the TALE protein:
All parts are under artificial synthesis process, so there is few results, which can prove our changes are useful. However, with the principle of complementary base pairing, our choice should be better than original vision. And if you want our data or use our method to create your own best sticky ends, just contact us!