Team:UT-Tokyo/Counter/Project/Modeling

From 2014.igem.org

Modeling is an attempt to describe, in a precise way, an understanding of the elements of a system of interest, their states, and their interactions with other elements.

The purpose of our modeling team is to peel back the layer of appearance of the device to reveal it's underlying nature. We tried to improve the device, cooperating with the experiment team. To achieve our goal, we have developed three fundamental themes. These three themes divide the modeling part into three parts. At the beginning, we confirmed whether our circuit realizes a reaction:this for part 1. When it comes to extending the counter to a device which can count a lot of numbers, there is a possibility that the device skips the next state due to the leak expression of sigma factor. So we examined what influences the leak of sigma factor have on the state-transit device:this for part 2. Thirdly, as an implementation of our genetic circuit, we designed new construct freeing the degree of induce-time and checked how the circuit works:this for part 3. Finally, we discussed what would be appropriate modeling, frequent issue to attack, in order to find the best strategy of modeling and wrote how we constructed our model:this Guide for modeling.

In Part1(deterministic model, stochastic model), we considered whether the device outputs much higher fluorescence when 2nd-induced than that of 1st-induced. In addition, we checked the key function of our device, reset function. Two approaches was made.

・Deterministic model:In this model,intermolecular interactions such as DNA-protein interactions and protein-ligand interactions are described as differential equations and concentrations of product (multimolecular complex) can be calculated by those of reactants. This model is intuitive, simple and hence popular to estimate the result of experiment.

・Stochastic model:Due to a cell's small volume (down to a few femtoliters for bacterial cells) and typically low bimolecular concentrations, the absolute number of molecules of a given signaling species in a cell may be quite small (typically on the order of 100 to 103). At such low numbers of reacting molecules, the reactions are heavily influenced by chance events, such as the chance encounter of reactant molecules. So we also used the stochastic model. The most common formulation of stochastic models for biochemical networks is the chemical master equation (CME). We used Gillespie algorithm to solve CME. Other method to solve CME is mentioned in Part4.

In Part2(Leaky expression analysis), taking the influences of the leak of sigma factor into consideration, we designed model which can change the induce time. In the end, we concluded that the leak of sigma-factor can be ignored when the induce time is much longer than the time of degradation of sigma-factor.

In Part3(Improvement of counter), the current sigma-Re-counter depends much on pulse length, so we devised genetic circuits that would not be affected by the pulse length of the arabinose induction. We modeled this construct to test if it can be realized.

First construction we came up with about the counter is described as following.

These generators have following functions:

・production of cis-repressed mRNA of a sigma factor by constitutive promoter.

・production of taRNA by the 1st arabinose induction and activating of cis-repressed mRNA by taRNA.

・positive feedback loops formed by sigma factors.

・production of cis-repressed mRNA of GFP by sigma factors.

・activating mRNA of GFP by the 2nd arabinose induction.

・degradation of sigma factors by induction of IPTG.

The counter consists of these functions. Our first work is confirming theoretically that these functions cooperate with each other, GFP expresses significantly only by 2nd arabinose induction and the counter is reset by IPTG.

In this simulation, we used two types of models, the deterministic model and the stochastic model. First, we simulated the counter by the deterministic model, which is popular and intuitive. Second, we simulated it by the stochastic model, which takes paucity and fluctuation of reactants into account.

Formulation of models and results of simulations are described below.

First of all, we constructed the deterministic model to estimate the behavior of the counter. In this model, biochemical reactions are described as differential equations and concentration of reaction product can be calculated by those of reactants. This model is intuitive, simple and hence popular to estimate the result of experiment. We were able to get some parameters for modeling of counter from previous works.[→ see parameter section]

We had simplified the construction of the mathematical model before described time evolution in which concentrations of mRNAs and proteins change as differential equations. First, we regarded that the reaction between taRNA(trans-activating RNA) and crRNA(cis-repressor RNA) in riboregulator is much faster than that of transcription or translation and equilibrium reaction. This diminution of parameters enable us to use the equilibrium constant as a parameter and prevent us from overfitting when we adapt this model to raw data.

equation (1):

So, the concentrations of mRNAs and the coupling of taRNA and crRNA are represented as stated above . Subscript means the gene encoded by the mRNA. We regarded that the affinities between crRNA and taRNA on different mRNAs were equal. The dissociation constant of equilibrium reaction was therefore shown as following.

equation (2):

Using dissociation constant, concentrations of reaction products such as [mcr_cr-σ] could be described as function of those of taRNA and mRNA of σ and GFP. We put X, A and B as the total quantity of taRNA, sigma and GFP, respectively.

equation (3):

equation (4):

equation (5):

Using these equations((3)-(5)) and equilibrium constant, concentrations of free taRNA, free mRNAs and taRNA-mRNA complexes are described as following. These are all of simplifications.

equation (6):

equation (7):

equation (8):

equation (9):

equation (10):

Finally, we built up differential equations about concentrations of reaction products including mRNA of sigma which has no riboregulator. (It makes positive feedback loop.) This model assumed relationship between promoter and the amount of transcriptional product increasing per unit time. The amount is in proportion to the number of promoter if the promoter expressed constitutively and can be approximated by Hill equation if the inducer controlled its promoter. It was also assumed that the degradation of proteins and RNAs followed first order kinetics. Some of used parameters were cited from references.[1]~[9]

We aimed to determine parameters about sigma through experiment and used provisional parameter determined in reference to other promotor.

equation (11):

equation (12):

equation (13):

The transcriptional activity of mRNA encoding sigma factor initiated by positive feedback loop and that of mRNA encoding anti-sigma induced by IPTG are described as following.

equation (14):

equation (15):

In our project, IPTG induction was aimed at enough production of anti-sigma to reset the counter and the sensitivity of P_lac(lactose promoter) was not our main interest. Therefore, we used simple equation,(15) to describe how lactose promoter behave. P_lac depend on the concentration of IPTG but we regarded it as a fixed number in this modeling.

Taking into account that translation coincide with transcription in prokaryotes, the linearity between the concentration of transcriptional product and the change rate of the amount of translational product was assumed. this linearity does not depend on the kind of translational product. We also assumed that anti-sigma combine with sigma and form inert matter irreversibly, and the reaction velocity of that is proportional to product of these.

equation (16):

equation (17):

equation (18):

Using above-mentioned differential equations, we simulated behavior of the counter by Euler's method.

Parameter

Here, we explain how we determined the parameters of the deterministic model. PoPS (polymerase per second) of P_Const is 0.03[3], so its promoter activity is 0.03/(6.0× 10²³× 1.0× 10^-15)[M/sec] = 0.051[nM/sec]. The switch point and the hill coefficient of P_BAD are written in [4]. PoPS of P_BAD is 5/60[5] , so its RPU (relative promoter unit) is (5/60)/(0.03) = 2.78. We set the RPU of P_lac as 2 when IPTG is induced. We don't consider the leak expression from P_lac.

The average half life of mRNA is 2-5 min[1], so we set the degradation rate of mRNA as 0.010[/sec]. The half life of GFP is infinite[9], so we set the degradation of GFP as 0.0[/sec]. The degradation rate of sigma factor[20] is fast. We set it as 0.001[/sec]. The degradation rate of anti-sigma is unknown, We set it as 6.0× 10^-6, the average degradation rate of protein[11]. The equilibrium constant of the equations (1)is 80.0[nM][2]. The reaction rate of the association of sigma and anti-sigma is unknown. We assumed this reaction is fast and we set it as 10.0[/M sec]. The copy number of the plasmid is 100~300[7][8] , so we set as 200. When the number of lacZ mRNA is 62, the protein synthesis rate is 20[/sec][11], and we set the translational rate as 20/62 = 0.32, assuming it does not change from the kind of mRNA.

The summary of the parameters of this model is given in Table 1.

<img src = "" class = "figure" /> <legend>Table 1</legend>

Simulations

arabinose induction time: 5000-5120[sec], 15000-15120[sec], IPTG induction time: t > 22500[sec]

The unit of vertical axis is [nM], and that of the horizontal axis is [sec]. We can see GFP expression only after 2nd-induction and rapid degradation of sigma after IPTG induction. These mean that the counter shows collect behavior theoretically.

Sensitivity analysis

Sensitivity analysis is an examination of output value changing when a certain parameter changes.

By conducting sensitivity analysis, we could know how each parameter affects the system. Results of some specific parameters are shown below. The parameters we changed are shown in Horizontal axis. The vertical axis is the ratio of GFP that is expressed in the first induction to GFP expressed over the entire experiment. When we changed one parameter, the other parameters were fixed.

<img src = "" height="200px" class = "math" /> <legend>Fig.2 Horizontal axis : pulse length of the arabinose induction</legend> <img src = "" height="200px" class = "math" /> <legend>Fig.3 Horizontal axis : K_σ</legend> <img src = "" height="200px" class = "math" /> <legend>Fig.4 Horizontal axis :V_σ</legend>

Comparing Fig.2 with Fig.3, K_σ seems to have less influence on the system than the pulse length of the arabinose induction. From Fig.4, V_σ shouldn't be so large or so small to minimize the leak_GFP/GFP.

Stochastic Model

Formulation of the Model

If there are a lot of molecules, modeling usually uses ordinary differential equations, but some in vivo reactions involve only a few molecules. For example, transcription involves the cell's genomic DNA which is one copy or plasmids which are about 200 copies [7][8] in a cell of Escherichia coli. The average size of a cell of E. coli is about 1.0× 10^-15[L][15], so the concentration of DNA is about 1.7[nM] and the concentration of plasmids is about 200 times of it. This is obviously weak. Reactions like this are well affected by fluctuations due to the reactants's limited copy numbers. So, we need to take this fluctuations into our modeling which is derived from stochastic methods. We also introduce delay effect.

First we explain about the Gillespie algorithm which is often used in stochastic simulations. In the Gillespie algorithm, we treated not the concentration of molecules but the number of them. Reactions are also viewed as discrete, essentially instantaneous physical events. What we have to determine when using the Gillespie algorithm is (A) when the next reaction is going to occur and (B) which type of the reaction it will be. Looking more closely at the Gillespie algorithm by the next set of reaction formulas:

equation (19):

Let n₁, n₂, and n₃ denote the respective copy number of the components X₁, X₂, and X₃. Notice that they are all integer. First we have to determine how easily each reactions could happen. It depends on the number of components copied. In stochastic simulations, we often determine the parameter called stochastic rate constant, which is often written as "c. We assume that each possible combinations of reactant molecules have the same probability c per unit time to react. In other words, c× dt gives the probability that a particular combination of reactant molecules will react in a short time interval [t,t+dt). We call the stochastic rate constant of a reaction j, c_j. Considering the all combinations of reactant molecules, the probability that the reaction 0 occur in [t,t+dt) is c× n₁× n₂. We now define the propensity function as the function of which product with dt gives the probability that a particular reaction will occur in the next infinitesimal time dt, which is often written as "a. Later on, the propensity function of a reaction j is a_j. Following the equation:

equation (20):

Notice that c_j is invariant parameter, but a_j changes as the state changes. In the same way, a₁ = c₁× n₃.

First we answer the question (A) when is the next reaction going to occur? Now, to simplify the situation we assume the situation that only the reaction 0 occurs. Set the time as 0, and define P(t) as the probability that the reaction 0 doesn't occur in [0,t). Then from the definition of a,we obtain the equation; P(t+dt) = P(t)(1-a× dt). (Because the probability that the reaction 0 doesn't occur in [0,t+dt) is the product of the probability that the reaction 0 doesn't occur in [0,t) with the probability that the reaction 0 doesn't occur in [t,t+dt).) Using P(t+dt) = P(t) + dP(t)/dt, we get :

equation (21):

Because the probability that the reaction0 doesn't occur in a 0 second interval is zero; P(0)=1. Solving the above ordinary differential equation we get :

equation (22):

If r₁ is a uniform number from [0,1], the time of the next reaction should be determined by solving P(t) = r₁. Using (22), we get t = -a₀/log r₁.

Now we suppose there is N types of reactions. Let a₁,a₂,…,a_N denote the respective propensity function of reaction 1,2,…,N. From previous method;

equation (23):

Let dt be so small that we can ignore the term of higher than two orders of dt. The equation(23) becomes:

equation (24):

Solving (4) (a = Σ_{j=1}^{N} a_j)

equation (25):

Setting τ as the time of the next reaction, we get:

equation (26):

Second we answer the question (B) what types of the reaction will it be? We determined the time of the next reaction, so what we have left to do is to determine what kind of reaction occurs. Some people may feel queer, but in the Gillespie algorithm, first the time of next reaction will be determined, and second the kind of reaction will be determined. It is natural to determine that the probability that the reaction j occurs is a_j/a. If r₂ is a uniform number from [0,1], j is the only number that meets below in equations:

equation (27):

In the case a₀ ≧ a× r₂, the reaction that occurred is reaction 0.

Now we can run the Gillespie algorithm by following the next steps.(t_MAX is the finish time of the simulation.)
1.Initialize the system at t = 0 with initial numbers of molecules for each spices, n₀,… ,n_s
2.For each j = 0,1,…,r, calculate a_j(n) based on the current state n using (20)
3.Calculate the exit rate a(n) = Σ_{j=0}^{r} a</sub>j</sub>(n).
4.Compute a sample tau of the time until the next time using (26)
5.Update the time t = t +τ
6.Compute a sample j of the reaction index using (27)
7.Update the state n according to the reaction j.
8.If t < t_MAX, return to Step 2

Stochastic rate constant can be determined by the parameters we used in the deterministic model (if we modeled the reaction in the deterministic model) . If there are a lot of reactant molecules, stochastic simulations have to show similar results as those of deterministic simulations. For this reason, stochastic rate constant, c, can be calculated from the chemical reaction rate constant, k. See [10] if you want to know the deriving process. Here we just write the result.

For a unimolecular reaction, c numerically equals to k, whereas for a bimolecular reaction, c equals to k/N_AV if the species of the reactant molecules are different, or 2k/N_AV if they are the same.V is the volume of the system and N_A is the Avogadro's constant.

However, these results should not be taken to imply that the mathematical forms of the propensity functions are just heuristic extrapolations. The propensity functions are grounded in molecular physics, and the formulas of deterministic chemical kinetics are approximate consequences of the formulas of stochastic chemical kinetics, not the other way around.

The Gillespie algorithm is so clear and useful that it is often used. However, this algorithm is not suitable for describing transcriptions and translations because they are very slow and complex reactions involving many kinds of reactant molecules. If we treat transcription from plasmids as one reaction, assuming the copy number of plasmids as 200, then the propensity function a equals to the stochastic rate constant multiplied by 200 (200× c). So it will take about one of a two hundred times of an average transcription time to finish one transcription. Of course, in the time scale of average transcription time it is not a big problem, but this may not be good for simulating, like in our project, the system that uses the time for transcriptions and translations cannot be shortened. We introduce time-delay into the Gillespie algorithm based on [12]~[14]. The mathematical correctness of this algorithm is proved in [14]. Time-delay means treating reactions as following:

Furthermore, transcriptions and translations are too complex to list up all of the reactions step by step. Therefore it is better to treat them as time-delay than reaction formulas.

Now we begin to model our project, sigma Re-counter. In our model, there are five types of reactions: transcription, translation, an association and disassociation of crRNA and taRNA, an association and disassociation of sigma and anti-sigma, and degradation. We introduce time-delay into only transcription and translation. Then, we explain how we treat these three reactions in general.

We explain transcription's model of mRNA[11]. When RNA polymerase binds to the promoter region, first they take the RNAP・promoter close complex. At this state, the complex can dissociate. But with a certain probability, the close complex turn to the open complex which doesn't dissociate. After the RNA polymerase and the promoter region take the open complex, the elongation of mRNA starts. In this state, the promoter region is cleared and RBS(Ribosome binding site) is synthesized, so we model as reaction3'. It is difficult to model the elongation of mRNA by elementary reaction formula because it is a complex reaction, so we model by time-delay. Then the reaction formula of transcription can be described as following's reactions:

combining reaction3' and reaction3, we get:

In the above reaction formula, DNA denotes the promoter region, and mRNA denotes RBS.

In our project, we have to model transcription of taRNA. Because taRNA functions only when it is completely transcribed, we model as following reactions:

We refer to the translational model [8]. Similarly to the transcriptional model we model as following;

combining reaction2' and reaction2, we get:

We model the association and disassociation of crRNA and taRNA as a reversible reaction:

We know that anti-sigma directly affect sigma[20], but a large part of their relation is still unknown. From their direct relation, We model as following reactions:

We model the degradation as the following reaction:

We can conclude that reaction formulas of our model are as follows: