Team:Tsinghua-A/Modeling
From 2014.igem.org
1
Algorithms
Overview
In order to lower down the repeatability rate of codons, we use intelligent optimization algorithm.
We exploit amino acid degeneracy and alternate nucleotides to reduce the repetition rate of bases of TALE DNA sequence. (See Figure 1) Then the optimized TALE sequence will be tested in wet lab to verify our conjecture whether TALE DNA sequence of lower repeatability rate works at higher efficiency.
Figure 1. RNA codon Table
For instance, we may change UUC into UUU to make one of the repeats different from others while the amino acid(PhenylalaninePhe) is identical.
By this way, we use two types of intelligent optimization algorithm to optimize TALE DNA sequence.
Genetic Algorithm(GA)
-Introduction of Genetic Algorithm
Genetic Algorithm is a search heuristic that mimics the process of natural selection. The heuristic is routinely used to generate useful solutions to optimization and search problems.[1]
GA has lots of applications in many fields, such as in bioinformatics, using GA to optimize DNA successfully made the sequences have better physic-chemical properties for PCR.
-Our strategy
Initialization
Create a population of hundreds of TALE sequences.
Mutation
Each sequence changes a base randomly under the premise that 1. Amino acids sequence remain unchanged 2. To prevent TALE sequence from being cut, no restriction enzyme site we exploit in experiment is allowed to be created 3. Mutation should not take place on the overhang of each repeat.
Crossover
In GA, crossover is a simulation of the process of synapsis. We randomly choose a point of amino acid, and two sequences exchange their parts from the point we choose to one of the ends.
Figure 2.The process of crossover is a simulation of natural synapsis of chromosome.
Fitness Scoring
We take two main factors into consideration:
1.Repeatability rate of the sequence
To judge the repeatability, we compare the changed sequence with the original one and count the Hamming distance between them. When a restriction enzyme site sequence occurs on the mutated TALE sequence, there will be a penalty.
2.Codon Usage
When the sequence contains too many rare codons, which means the number of the homologous tRNA is extremely low in E.coli, the whole sequence can hardly express. In order to avoid the appearance of rare codons, each occurrence of rare codons will lead to penalty.
Selection
In each generation we sort the 200 sequences according to the scores of each single sequence. And we terminate the half of lower scores, the rest of them will have better fitness. Repeat the process of mutation and crossover, the average score of each generation will increase. (See Results.)
We repeat the process for 600 generations.
Simulated Annealing(SA)
-Introduction of Simulated Annealing
Simulated annealing is a simple and general algorithm for finding global minima. It operates by simulating the cooling of a (usually fictitious) physical system whose possible energies correspond to the values of the objective function being minimized. The analogy works because physical systems occupy only the states with lowest energy as the temperature is lowered to absolute zero.
-Our Strategy
Mutation
We randomly choose 5 points on the sequence to mutate. To each single point, the process is the same with our computational mutation in genetic algorithm.
Scoring
The same as fitness scoring in genetic algorithm.
Parameters
The initial temperature T is 10000.0, after each generation, the temperature reduces to 99% of the former temperature. If the new generation has higher score, it will be accepted. Otherwise it will be accepted at a probability P(A)
References
[1]Wikipedia http://en.wikipedia.org/wiki/Genetic_algorithm
2
Results
3
Simulation
System Model
The model of the report system is shown as the picture above.
We use the following formula to simulate the lac repressor system.
We use Michaelis-Menten equation to simulate the process.
Simulation Results
Figure 999. Simulation of the IPTG-lac repressor system.
We infer that the situation of TALE will be similar with IPTG. When TALE protein binds on the plasmid, the expression of lacI will be repressed. Consequently, the concentration of RFP will be increased. If no TALE proteins
are expressed, the concentration of RFP will be very low.
Figure 1000. Situations that TALE successfully express and not express.
From the simulation, we can conclude that RFP can be a signal of whether our TALE has successfully expressed, moreover, the concentration of RFP, which shows as the fluorescence intensity, has positive correlation with the quantity of TALE protein.
References
[1]Uri Alon,An Introduction to Systems Biology[B],Chapman & Hall/CRC, P244