Team:SYSU-China/file/Project/Model/Populationgrowthmodel.html
From 2014.igem.org
Line 32: | Line 32: | ||
- | < | + | <h2> |
Model Analysis | Model Analysis | ||
- | </ | + | </h2> |
<p> | <p> |
Revision as of 00:30, 18 October 2014
Population Growth Model
Introduction
Extreme selection pressure is an essential assumption of stochastic evolution model, but it can hardly be satisfied in experiment. In reality, all the progeny phages released before the selection time set by experiment can enter the next round of experiment with certain probability. Besides, what are selected in experiment are the phages with different proliferation rate, rather than the differences in the evolved sequence carried by each phage. Considering these, a new model is built to simulation the growth and evolution of the phages population.
First, similar with stochastic evolution model, we simplify the problem of score protein-protein interaction to the local alignment of their corresponding DNA by introducing similarity between sequences. Then, a formula linking the similarity and progeny number are constructed by considering two limiting cases. With several assumptions, the similarity frequency distribution of the evolved population can be obtained and assigned to the next cycle of experiments. Circulate the process mentioned above simulates each cycle of experiment directly.
This model provides plenty of non-obvious properties of the evolution system: the existence of plateau at the beginning, the necessity of flat similarity frequency distribution for evolution, the pro and con of high mutation rate etc. In addition, suggestions and limitation of experiments are also provided, such as evolving different protein in systems with different mutation rate and limiting the background expression of two-hybrid system.
From population growth perspective, this model answers the question of “Weather the mutated progeny with relatively higher proliferation rate can be selected in our system”. However, the constructed formula liking protein-protein interaction and progeny number is oversimplified. Further modification should take complicated chemical dynamical behavior of the system into consideration.
Symbols and Definitions
[Table]
Model Analysis
Estimating the proliferation rate of each phage is of the essence of population growth model. Proliferation rate can be influenced by many factors, which are impossible to be all controlled in experiment or be taken into consideration in this model. For simplification, the basic assumption of this model is:
Except for the carried sequences that encoding antibody are distinct and variable, all the other features of phages are constant and exactly the same.
With this assumption, our model needs to estimate the number of mutated progeny, the influence of mutation and the proliferation rate of phages, which is related with the sequence encoding evolved antibody.
Proliferation Rate & Progeny Number
To estimate the number of progeny each phage produce, the reproductive mechanism of M13 phages should be taken into consideration. According to experiments, approximately 2700 copies of major coat protein P8 (encoding by GeneVIII) and 5 copies of minor coat protein P3(encoding by GeneIII) are required for the each M13 phages. Insufficiency of either P8 or P3 will strongly impede the construction of phage particles. In our experiment, M13 phages with GeneIII or GeneVIII knocked out are rescued by a two-hybrid system mediated by antibody-antigen interaction. Here, we take GeneVIII as an example to estimate the proliferation rate and progeny number.
Assuming the infected bacteria live in ideal state and the absence of GeneVIII does not affect any other biological processes of phages, we can infer that the all expression products are well prepared (as wide-type M13 phage) expect for P8. If the expression of GeneVIII, which is induced by two-hybrid system mediated by antibody-antigen interaction, is always weaker than that of wide type M13 phages, the production rate of P8 will be able to directly constrain the assembly and release process of progeny.
For M13 phages, the viral particles are continuously assembled and released when the copy number of RF DNA reach about 200. Denote the production rate of P8 as pRVIIIM, the time between wide-type phages begin to produce P8 and release viral particle as tWTC, the extra waiting time between the beginning of release and the selection time set by experiment as eWT. Then the total progeny released by single infected bacterium (tRPN) is given by (3).
[Equ.1]
Where the first term stands for the number of phages that P8 produced in tWTC sufficient to assemble and the second term stands for number of phages assembled and released in eWT.
Antigen-Antibody Interaction and Production Rate of P8
What directly selected in our experiment are phages that produce more progeny before the selection time. Based on the basic assumption, the only difference between each phage is the antibody they carried, which uniquely determines the production rate of P8. In certain range, it’s reasonable to infer that the production rate of P8 is positive correlated to the affinity of antibody. Due to the complexity of two-hybrid system and the lack of information, it’s almost impossible to precisely predict pRVIIIM as a function of affinity. However, this function is of essence in this model, so we try to conjecture a function that can reveal some properties of this system and apply it for further calculation.
First, consider the antibody-antigen interaction. Similar with stochastic model, we assume the existence of an antibody that has maximum binding energy with the antigen. Denote this antibody as “target antibody”, its DNA sequence as “target sequence”, and the maximum binding energy as ϵ_0. By comparing the binding energy of evolve antibody (denote as ϵ), the similarity S between evolved antibody and target antibody is defined as:
[Equ.2]
Equ (4) is very reasonable but also very unpractical. To estimate the similarity, we assume that the similarity between proteins can be represented by the similarity between the DNA sequences encoding them. Inspired by local alignment used in stochastic model, similarity between evolved antibody and target antibody can be calculated by the following equation:
[Equ.3]
where N_S is the number of sites on evolved DNA sequence that share the same kinds of base with the corresponding sites (denote as “right sites”) on target DNA sequence; loD is the length of DNA sequence, which is assumed to be the same for the target sequence and the evolved ones. What’s assumed by Equ(5) is that the binding energy ϵ is uniformly distributed to each bases on the DNA sequence. Unlike the stochastic model, this model no longer focuses on specific sequence, but classifies different DNA sequences according to their similarity S. In other words, many difference specific DNA sequences may have the same similarity S.
Second, the exact function of pRP8 with respect to similarity S may be very complicated, but its approximate form can be constructed by considering two limiting cases:
1) When S=0, namely ϵ=0, the evolved antibody is so different from the target one that it can hardly bind with the antigen. Thus, neglecting the background expression of GeneVIII, the production rate of P8 can be considered as 0.
2) WhenS→1,namely ϵ→ϵ_0, the binding energy between evolved antibody and antigen is so big that the combination of them is no longer the main restriction for the expression degree of GeneVIII. So the production rate of P8 becomes the maximum one (denote as maxVIIIR) that constrained by other factors of the system.
Inspired by Boltzmann Distribution, we construct one equation satisfying the limit case mentioned above:
[Equ.4]
where k_B is the Boltzmann constants and T is the Kevin’s temperature of the environment. Denote ϵ_0/(k_B T) as eng, Equ(4) can be written as:
[Equ.5]
here, we set eng=3.0, so 1-Exp(-3)=0.95→0, satisfying the extreme condition 2.
Number of mutated progeny in population
The mutation rate (mR) of DNA polymerase used in our experiment is about 10-5. Assuming the base mutations are independent of each other, the number of mutated bases (numMB) during DNA replication can be estimated. Denote the length of DNA sequence that encoding antibody as loD, the expectation of numMB is:
[Equ.6]
Since loD is quite large while mR is very small, Binomial distribution can be replaced by Poisson distribution:
[Equ.7]
where λ=numMB. Take loD=300 as an example, the probability of having M bases mutated in once DNA replication are shown in the following table.
[Table]
The probability of having M bases mutated decrease rapidly. If the population of bacteria in each round of experiment is about 10^12, then the progeny phages with more than 4 bases mutated can be neglected.
Influence of mutation
After estimating the number of phages with M bases mutated, how these mutation influence the similarity S can be calculated analytically.
For M bases mutated, each of them has 3 possible results:
1) A right base is mutated and thus ΔN_S=-1 (denote as wrong mutation)
2) A wrong base is mutated to another wrong bases and thus ΔN_S=0 (denote as neutral mutation)
3) A wrong base is mutated to the right one and thus ΔN_S=+1 (denote as right mutation)
Denote ΔN_S=i, the probabilities of each result above are given by the following formula:
[Equ.8]
Where square brackets mean rounded down. Now, the similarity of the mutated phages can be estimated by:
[Equ.9]
Here, we have assumed that the influence of mutation is moderate. Each mutation only changes one base and thus changes the binding energy by ϵ0/loD
Model Implementation
[Table]
Results and Discussions
Basic Properties
The similarity distributions of each generation are record in our program. Here, the evolution of population in the first 200 generations and the last 6000 generations are illustrated and the average similarity is calculated. By average similarity, it’s average of similarity weighted by the number of phages with different S.
[Fig.1 Fig.2]
These illustrations are very informative, unraveling some non-obvious properties of our system:
1) At the beginning of evolution, the phages are identical. It takes the population several generations to accumulate the mutated phages before the average similarity increase remarkably. It suggests that a flat distribution of allelic gene frequency is necessary for the evolution of population. So at the early period of our experiment, the mutated phages are undetectable and the evolved protein is not well-improved. In contract, once the mutated phages can be detected, the evolved protein may be improved relatively frequently.
2) At the end of evolution, a stable final similarity distribution will be formed. Though the majority of the phages carry the target sequence, the imperfect sequences also exist in the population stably. As a result, to obtain the phages carrying the target sequence, further purification is necessary.
To get an overview of the evolution, now we focus on the increase of average similarity as the experiment processes, which is shown with the parameters used in simulation.
[Fig 3]
As Fig. 3 presents, the average similarity remain unchanged at the beginning (denote as plateau) and increase rapidly later. The evolution process decelerates gradually as average similarity increases.
Parameter Scan
The analytical solution of this model is very complicated so numerical solution is necessary. To reduce the computational load while remain the result comparable, the following values are set as the basic parameters in simulation.
[Table]
Length of evolved DNA
The length of DNA that encoding antibody is an important parameter of our model. The increase of loD will increase not only the number of mutated progeny (nPM) but also the number of right sites (N_S) required by the same similarity S. The net effect is not easy to get by qualitative analysis while our model is able to give quantitative answer, as shown in Fig. 3 and Fig. 4.
[Fig.4 Fig.5]
Apparently, evolution gets slower as the evolved DNA sequences get longer. By calculating the difference between each generation, the increment of average similarity (iAS) , which stands for the speed of evolution, can be obtained. With Fig. 4 it’s become more evident that the increase of loD lengthen the plateau at the beginning of evolution. Besides, what’s unexpected is that the speed of evolution increases rapidly to a peak and decrease gradually with some fluctuation.
Consequently, more time is required for the evolution of large protein.
Influence of Original Similarity
The parameter scan of original similarity illustrates that, the larger the original similarity is, the longer the plateau will be, as shown in Fig. 6
[Fig.6]
Specifically, we focus on the evolved protein with original similarity oS=0.9.
[Fig.7]
According to Fig. 6, the average similarity decreases at the beginning and increases to approximately 100% later. The decrease can be explained by the high probability of wrong mutation in a high similarity sequence, which is given by Equ(8). Since the mutated progeny mainly come from the mutation of the original phages, rather than the proliferation of their own, most of mutated phages are wrong mutated ones. Only when the population of right mutated phages have accumulated to a certain number so that their advantages of higher proliferation rate becomes obvious, can the average similarity begin to increase, as shown in Fig. 7.
[Fig.8]
Population of infected bacteria
The population of infected bacteria (pB) is one of the key parameters of our experiment. It’s very important to note that protein evolves efficiently only when the population of phage is much larger than the population of bacteria. In addition, the size of selection pool is limited by the number of bacteria available for infection. More properties are shown in Fig. 8 and Fig. 9.
[Fig.9 10]
First, the quantitative result validates our reasoning: protein evolves faster in large bacteria population. Apart from this, the length of plateau is independent of pB while the evolution becomes more stable when pB increase.
Mutation Rate
Mutation is the center of our design. The mutation rate of our experiment is estimated as 〖10〗^(-5). Since the mutation rate can be tuned in experiment, we investigate the influence of different mutation rate in the range of 〖10〗^(-6)~〖10〗^(-4) here.
[Fig11 12]
According to Fig. 10, high mutation rate significantly accelerate the evolution at the beginning. However, the final average similarity is negative correlated to mutation rate. By plotting final average similarity with respect to mutation rate, a good linear relation appears. The linear regression gives the following function, which is also shown in Fig. 12:
fAS=0.9993-564.518mR 10
where fAS stands for final average similarity.
[Fig.13]
Consequently, our model assists us in the design of our further experiment: 1) At the beginning of evolution, bacteria with high mutation rate DNA replication system can be applied to speed the evolution. 2) At the later state of evolution, bacteria with relatively low mutation rate DNA replication system can be applied to elevate the final average similarity. 3) Selection pool can be enlarged by using high mutation rate DNA replication system, while protein for further improvement should be involved in low mutation rate DNA replication system.
Binding Energy
In previous simulation, the binding energy between target antibody and antigen ϵ_0 was set as 3kb*T. For natural protein, the range of binding energy is broad. Weather our system can evolve various kinds of protein with different “binding energy” and how binding energy influence the evolution process? To answer this question, eng=3,7,11,15,19 are scanned with basic parameters.
[Fig.14 15]
As presented by Fig. 14 and Fig. 15, it takes much more time for protein with higher maximum binding energy to evolve.
Modification
Background Expression
Previously, we build this model assuming the background expression of GeneVIII can be neglected. Here, we take background expression into consideration by modifying Equ(5) by adding background expression (bE) term:
pRP8=maxP8[1-Exp(-eng*S)] 11
where bE/maxVIIIR illustrates how significant the background expression is , compared with the gene expression mediate by two-hybrid system. The evolution process starting form oS=0.1 and oS=0.9 are shown below, with different strength of background expression scanned.
[Fig.16 17]
Fig. 16 and Fig. 17 suggest that the protein can be evolved with non-neglectable background expression, but high level of background expression will strong impede the evolution process. From this perspective, considering the promoter of GeneVIII is stronger than that of GeneIII, the system engineered with GeneVIII may be more efficiency in protein evolution than that with GeneIII.
Significant Impact of Mutation
Equ(5) is construction by considering two limiting cases. Principally, we can construct other form of relation, which is dependent on the properties we care about. Equ(5) is constructed on the base of neglecting the significant impact of mutation and the evolution speed is underestimated, compared with related experiment (about 150 generations). Here, we try to construction another form of relation between similarity and production rate pRV8 for high initial similarity and reveal the significant impact of mutation.
[Equ.12]
where k is a constant describe how each one site of mutation changes the production rate of pRP8. Here, we set k=1.2, the evolution of the population is shown as follows:
[Fig.18]
Obviously, the increase of average similarity is much faster than that given by the model previously, even faster than that observed in related experiments. Evidently, the evolution speed is largely dependent on the specific formula we construct. A more reliable relation can be obtained only by considering the complicated chemical reactions of our system.
Conclusion
This model describes the evolution from a population growth perspective. Based on several non-rigorous assumptions, population model manage to simulate the evolution process. With data visualization, this model helps us a lot in better understanding the system we design by unraveling properties of evolution and providing suggestions to experiments:
1) At the beginning of experiment, phages with right mutated sequence are almost undetectable, due to the low concentration of mutated phages.
2) A flat frequency distribution of various DNA sequences that encoding antibodies are indispensable for the population evolution and the forming period of such distribution result in the initial plateau of the population’s average similarity. After the initial plateau, the population will evolve much faster.
3)The larger the original similarity is, the longer the plateau will be. For larger original similarity, the average similarity decrease at the beginning of initial plateau and increase only when number of right mutated phages is large enough to show its proliferation advantages.
4)Increase the number of bacteria used for infection can accelerate the evolution.
5)At the beginning of evolution, bacteria with high mutation rate DNA replication system can be applied to enlarge the library size and thus speed the evolution; In the later state of evolution (or to further improve a not-bad antibody), bacteria with relatively low mutation rate DNA replication system can be applied to elevate the final average similarity.
6) Stable similarity frequency distribution forms at the end of evolution, which suggests that further purification is necessary for extracting the phages carrying the target sequence.
Meanwhile, by quantitative analysis, some assumptions of this model are also some realistic requirements for experiment:
1) The population of phages should be much larger the number of bacteria used in each round of experiment.
2)The background expression of two-hybrid system must be weaker than the expression of the corresponding gene in wide-type phages.
In conclusion, population model shows that our design is practical within the several limits. Higher-similarity antibody will be selected and continuously improved because it can increase the progeny number of its carrier. However, our model fails to meet the evolution speed shown in related experiments. Further modification must consider some chemical dynamical properties of the two-hybrid system and M13 life cycle.