Team:Tsinghua-A/Logo

From 2014.igem.org




1

Algorithms

Overview

In order to lower down the repeatability rate of codons, we use intelligent optimization algorithm. We exploit amino acid degeneracy and alternate nucleotides to reduce the repetition rate of bases of TALE DNA sequence. (See Figure 1) Then the optimized TALE sequence will be tested in wet lab to verify our conjecture whether TALE DNA sequence of lower repeatability rate works at higher efficiency.

Title

Figure 1. RNA codon Table

For instance, we may change UUC into UUU to make one of the repeats different from others while the amino acid(PhenylalaninePhe) is identical.

By this way, we use two types of intelligent optimization algorithm to optimize TALE DNA sequence.

Genetic Algorithm(GA)

-Introduction of Genetic Algorithm
Genetic Algorithm is a search heuristic that mimics the process of natural selection. The heuristic is routinely used to generate useful solutions to optimization and search problems.[1]
GA has lots of applications in many fields, such as in bioinformatics, using GA to optimize DNA successfully made the sequences have better physic-chemical properties for PCR.

-Our strategy
Initialization
Create a population of hundreds of TALE sequences.
Mutation
Each sequence changes a base randomly under the premise that 1. Amino acids sequence remain unchanged 2. To prevent TALE sequence from being cut, no restriction enzyme site we exploit in experiment is allowed to be created 3. Mutation should not take place on the overhang of each repeat.
Crossover
In GA, crossover is a simulation of the process of synapsis. We randomly choose a point of amino acid, and two sequences exchange their parts from the point we choose to one of the ends.


Figure 2.The process of crossover is a simulation of natural synapsis of chromosome.

Fitness Scoring
We take two main factors into consideration:
1.Repeatability rate of the sequence
To judge the repeatability, we compare the changed sequence with the original one and count the Hamming distance between them. When a restriction enzyme site sequence occurs on the mutated TALE sequence, there will be a penalty.
2.Codon Usage
When the sequence contains too many rare codons, which means the number of the homologous tRNA is extremely low in E.coli, the whole sequence can hardly express. In order to avoid the appearance of rare codons, each occurrence of rare codons will lead to penalty.
Selection
In each generation we sort the 200 sequences according to the scores of each single sequence. And we terminate the half of lower scores, the rest of them will have better fitness. Repeat the process of mutation and crossover, the average score of each generation will increase. (See Results.)
We repeat the process for 600 generations.

Simulated Annealing(SA)

-Introduction of Simulated Annealing
Simulated annealing is a simple and general algorithm for finding global minima. It operates by simulating the cooling of a (usually fictitious) physical system whose possible energies correspond to the values of the objective function being minimized. The analogy works because physical systems occupy only the states with lowest energy as the temperature is lowered to absolute zero.

-Our Strategy
Mutation
We randomly choose 5 points on the sequence to mutate. To each single point, the process is the same with our computational mutation in genetic algorithm.
Scoring
The same as fitness scoring in genetic algorithm.
Parameters
The initial temperature T is 10000.0, after each generation, the temperature reduces to 99% of the former temperature. If the new generation has higher score, it will be accepted. Otherwise it will be accepted at a probability P(A)

Title

-References
[1]Wikipedia http://en.wikipedia.org/wiki/Genetic_algorithm



2

Results

Genetic Algorithms

We output the average scores of each generation. And we can see that the scores range from -4500 to 350, and after about 100 generations, the average score becomes stable.(See Figure.3)


Figure 3. The fitness score's variation among the whole 600 generations.

In order to see the variation of the score more clearer, we output the score of last 50 generations.



Figure 4. The fitness score of the last 50 generations.

The second graph in Figure 4 shows the highest score, lowest score and average score in each generation. We can see that the extreme values are relatively stable. The sequence which has the highest score can be seen as the best sequence that we found.

Simulated Annealing

To compare with GA, we output the scores of each generation.


Figure 5. The score variation.

Because of the parameters we gave, the probablity of accepting the new solution when the score becomes lower is relatively low.


Figure 6. The score of last 50 solutions.

Compared with genetic algorithm, we can see that simulated annealing has lower convergence rate and higher probability randomicity, which can be affected by the temperature, cooling velocity and coefficient that are set.



3

Simulation

System Model

Figure 7. The model of the report system.

We use the following formula to simulate the lac repressor system.


We use Michaelis-Menten equation to simulate the process.



Simulation Results


Figure 8. Simulation of the IPTG-lac repressor system.

We infer that the situation of TALE will be similar with IPTG. When TALE protein binds on the plasmid, the expression of lacI will be repressed. Consequently, the concentration of RFP will be increased. If no TALE proteins are expressed, the concentration of RFP will be very low.


Figure 9. Situations that TALE successfully express and not express.

From the simulation, we can conclude that RFP can be a signal of whether our TALE has successfully expressed, moreover, the concentration of RFP, which shows as the fluorescence intensity, has positive correlation with the quantity of TALE protein.

-References
[1]Uri Alon,An Introduction to Systems Biology[B],Chapman & Hall/CRC, P244