Team:Valencia Biocampus/Modeling

From 2014.igem.org

(Difference between revisions)

Revision as of 17:31, 13 October 2014

Modeling

Presentation

We derive here some basic formulae needed to model the dependence of bio-bricks expression with temperature, pH or UV radiation. Our observable is the relative change of the intensity of fluorescence per cell due to the variation of a given external factor.

Many processes are involved in protein synthesis. Our goal is to identify the main processes that capture the response to changes and study the dynamics of the model. With this plan in mind, we simplify our problem by assuming that a) expression is linearly proportional to the number of cells, and b) mean number plasmids per cell remains constant.

Protein production

The mRNA and protein production of a cell can be described by the kinetic equations:

$$ \begin{eqnarray} \frac{d N_{mRNA} }{dt}&=& P \cdot r_T - N_{mRNA} \cdot \lambda_{mRNA} \nonumber ,\\ \frac{d N_{Pr} }{dt}&=& N_{mRNA} \cdot r_P - N_{Pr} \cdot \lambda_{Pr}, \nonumber \end{eqnarray} $$

where $N_{\it{mRNA}} (N_{Pr} )$ is the number of biobrick mRNA (protein) molecules per cell, $\lambda_{mRNA} $ ($\lambda_{Pr}$) is the mRNA (protein) decay rate, $r_T$ ($r_P$) is the transcription (translation) rate, P is the mean number of plasmids per cell.

Assuming stationary messenger and protein content during the cell cycle, $\frac{d N_{mRNA} }{dt}=0$ and $\frac{d N_{Pr} }{dt}=0$, we get

$$ \begin{equation} N_{Pr} = \frac{ r_T \cdot r_P }{ \lambda_{mRNA}\cdot \lambda_{Pr} } P . \end{equation} $$

This equation describes the amount of total synthesized protein as a function of calculable rates and the number of free plasmids. It can be easily shown that, after many generations, the mean number of proteins per cell is 1.5 times the protein production during a cell cycle.

Enzymatic activity

External agents may modify the protein folding efficiency, and therefore change the fluorescence emission. The fraction of proteins correctly folded (active) in equilibrium at a given temperature can be calculated from the folding-unfolding equilibrium reactions:

$$ \begin{equation} K_{eq} = \frac{N^{eq}_A}{N_{Pr}- N^{eq}_A} = e^{-\frac{\Delta G_F}{RT } } \nonumber, \end{equation} $$ where R is the gas constant. This equation leads to $$ \begin{equation} N^{eq}_A = \frac{N_{Pr}}{1+e^{\frac{\Delta G_F}{RT } } } \nonumber. \end{equation} $$

The free energy of folding $\Delta G_F$ can be estimated as a function of length and temperature by [1] \begin{equation} \Delta G_F (L,T) = \Delta H(L) + \Delta C_p (L)(T-T_h) - T\Delta S(L) - \Delta C_p (L)T\ln\frac{T}{T_S} \nonumber, \end{equation} with $T_h=373.5$K, $T_s=385$K and \begin{eqnarray} \Delta H(L) &=& -5.03 L - 41.6 (kJ/mol)\nonumber, \\ \Delta C_p(L) &=& -0.062 L + 0.53 (kJ/mol)\nonumber, \\ \Delta S(L) &=& -16.8 L -85 (J/mol)\nonumber. \end{eqnarray}

The enzymatic activity would be \begin{equation} A_{c} = k e^{-\frac{E_a}{RT } } \frac{N_{Pr}}{1+e^{\frac{\Delta G_F}{RT } } } . \end{equation} where k is the Arrhenius constant and $E_a$ is the activation energy of the folding reaction.

Stability

Living organisms have genetic and biochemical tools to respond to stress. These tools normally up-regulate some stress-specific genes, and, directly or indirectly, down-regulate the rest of normal genes (most of them housekeeping genes). On the other hand, every E. coli component works worse under conditions different from their optimum (for example, nutrients uptake or ATP production), so that, every E. coli function will be reduced. We model the relative expression of biobricks by variations of temperature, pH and salt concentration.

Dependence on Temperature

The expression level of our heterologous gene can be calculated by equations 1 and 2. Our main goal is to find which of the terms will change most significantly with the different stress factors. We will be interested in the activity relative to a reference one, which we take as the activity of E. coli bacteria grown on the LB medium at 38.7 ^oC, pH 7 and 1% of NaCl.

$$ \begin{equation} \frac{A_{c}}{A^0_{c}} = \prod_i \frac{V_i}{V^0_i} . \end{equation} $$ where $V$ is any of the factors contributing to the activity and the index $0$ refers to the reference case.

Transcription rate

Among several factors, we first discuss the transcription rate. In the range of temperatures (or pH) that we have studied, the polymerase is sufficiently stable to assume that the elongation rate is not affected and the interaction energies do not change significantly (for example, the promoter specific interaction do only change 16% from 33 ^oC to 44 ^oC). We did not find any significant difference between the nucleotide compositions of E. coli under stress [2]. Therefore, main variations in the transcription rate are due to the probability of RNAP-promoter bounds $$ \begin{equation} r_T=\frac{1}{\frac{1}{r_I}+\frac{1}{r_E}}Prob_B \simeq Prob_B. \nonumber \end{equation} $$ On the other hand, the down-regulation effect affects the expression level of our Biobrick, because it works as a housekeeping gene (with a normal -35 and -10 promoter for the RNA polymerase). For example, E. coli has a common σ factor for housekeeping genes (σ70), and, another one (σ32) for temperature stress. In the case of temperature stress, the bacteria will synthetize σ32, which will bind RNA polymerase (RNAP) cores that won’t be free for σ70, and so they will not be able to synthetize mRNA of the normal housekeeping genes.

Therefore, we just need to find a relation between the amount of σ factors and the expression level, which we derive next with thermodynamic arguments, and also the variation of σ factors with temperature, which can be determined experimentally.

The RNAP is bound to a promoter (unspecific DNA) with $\epsilon_{S}$ ($\epsilon_{NS}$) free energy. There are $N_{NS}$ nonspecific sites on the E. coli genome (sequences of 41bp) and R free E$\sigma$70, with $R \ll N_{NS}$ . The combinatory number of unbound states, with energy R$\epsilon_{NS}$, is $$ \begin{equation} \frac{N_{NS}!}{R!(N_{NS}-R)!} \simeq \frac{N^R_{NS}}{R!} . \nonumber \end{equation} $$ Similarly, the combinatory number with one bound state, with energy $\epsilon_{S}$+(R-1)$\epsilon_{NS}$, is $$ \begin{equation} \frac{N_{NS}!}{(R-1)!(N_{NS}-R-1)!} \simeq \frac{N^{R-1}_{NS}}{(R-1)!} . \nonumber \end{equation} $$ Then, the probability of finding a RNAP bound to a promoter is with the enter $$ \begin{equation} Prob_B = \frac {\frac{R}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }}}{1+\frac{R}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }}}, \end{equation} $$ with $\Delta\epsilon = \epsilon_{S} - \epsilon_{NS}$, calculated with the energy matrix developed in [3] and $k_B$ is the Bolztmann constant. R $\simeq$ 1000 and $N_{NS} \simeq 10^7$ for E. coli.
Using the measured variation of free E$\sigma$70, in arbitrary units, with temperature, in $^o$C, [4] $$ \begin{equation} E\sigma70 = 0.027 T^2 - 2.12 T + 42.2, \nonumber \end{equation} $$ we can determine the relative change of transcription rate $$ \begin{equation} \frac{r_T}{r^0_T} = \frac{Prob_B}{Prob^0_B} = \frac { 1+\frac{R^0}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }} } { \frac {R^0}{R}+\frac{R^0}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }}} , \end{equation} $$ with $$ \begin{equation} \frac{R^0}{R} = \frac {0.027 (T^0)^2 - 2.12 T^0 + 42.2} {0.027 T^2 - 2.12 T + 42.2} . \end{equation} $$

Synthesis rate

Modeling synthesis rate follows the lines discussed in transcription rate, but the working regime is quite different. The energies of the mRNA-ribosome sequences interactions are much larger than in the RNAP-promoter ones. The average difference of free energies (computed with the RBS calculator for random sequences of E. coli genes) is $\Delta\epsilon = -20.79 kT$, which leads to minor variations of the ribosomal protein with the temperature [5].

Therefore, $$ \begin{equation} \frac{r_P}{r^0_P} = \frac{1+\frac{r^0_E}{r^0_I}}{\frac{r^0_E}{r_E}+\frac{r^0_E}{r^0_I}}, \end{equation} $$ where $r_E$ ($r_I$) is the elongation (initiation) rate and the superscript denotes our reference case. In E. coli, most of the ribosomes are translating (4 active every one free) [6], thus $\frac{r^0_E}{r^0_I} \simeq\frac{1}{4}$ and $$ \begin{equation} \frac{r_P}{r^0_P} = \frac{\frac{5}{4}}{\frac{r^0_E}{r_E}+\frac{1}{4}}. \end{equation} $$

We discuss next the dependence of elongation rates with temperature. We use the relation of amino acid concentration with the temperature [7]. Using data from [8], the elongation rate does not depend on the ternary complex concentration. The fastest steps are: 1) arrival of the ternary complex (0.01 s), 2) dipeptide bond formation (0.03s) and 3) translocation (0.022s). $$ \begin{equation} r_E = \frac{1}{\frac{1}{A_1}e^{\frac{E^1_a}{RT}}+\frac{1}{A_2}e^{\frac{E^2_a}{RT}}+\frac{1}{A_3}e^{\frac{E^3_a}{RT}}}, \end{equation} $$ where $E^i_a$ is (32, 75.2, 29260) kJ/mol and $A^{-1}_i$ is (2.5 $\cdot 10^7$, 1.6 $\cdot 10^{14}$, 3.9 $\cdot 10^6$).

Other Stress factors: pH and salt concentration

In contrast with temperature variations, E. coli internal pH and salt concentration hardly vary. Therefore, we focus on the intrinsic E. coli element that may be affected by these stresses, and influence Biobrick expression. The main factor is the number of free RNAP, due to the synthesis of stress response genes. We estimate the ratio of free RNAP under different conditions by the ratio of their growth rates $$ \begin{equation} \frac{r_T}{r^0_T} = \frac{Prob_B}{Prob^0_B} = \frac { 1+\frac{R^0}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }} } { \frac {R^0}{R}+\frac{R^0}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }}} , \end{equation} $$ with $$ \begin{equation} \frac{R^0}{R} = \frac{growth^0} {growth} . \end{equation} $$

Conclusion & Results

Our model includes two main contributions to the changes of the enzymatic activity due to changes in temperature: transcription and synthesis rates. In the case of the GFP, the folding free energy term is small and the relative enzymatic activity can be simplified to our master equation for stability $$ \begin{equation} \frac{Ac}{Ac^0} = \frac{r_T}{r^0_T} \frac{r_P}{r^0_P} . \end{equation} $$

Standardization

We can use the relative variation of transcription rate between strains or organisms, derived in our previous section

$$ \begin{equation} \frac{r_T}{r'_T} = \frac{Prob_B}{Prob'_B} = \frac { 1+\frac{R'}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }} } { \frac {R'}{R}+\frac{R'}{N_{NS}} e^{-\frac{\Delta\epsilon}{k_BT }}} , \end{equation} $$

to study how standardized is a biobrick in different OTU. There are very small genetic difference between strains which makes our model of limited use for comparing them. In the case of organisms, we can compare N_NS (genome size) of different organisms and by assuming equal energy and that the number of free enzymes R is inversely proportional to the number of genes, we can estimate how much the biobrick expresses from organism to organism.

Organism	genes	R/R (E. coli)	Genome size (Mb)	N_NS(E. coli)/N_NS	FORMULA HERE
E. coli	4418	1	4.7	1	1
Bacilllus Subtillis	4200	1.05	4.2	1.12	1.176
Mycoplasma	683	6.47	0.79	5.95	38.5

Even with this crude approximation, we find that the simplest organisms would be more stable and standard between their strains.

Orthogonality

Why modelling plasmid dynamics?

Synthetic biology aims to engineer a living cell to perform well-defined tasks. In order to achieve these aims, biologists introduce genes into cells using self-replicating DNA carriers called plasmids. Therefore, the levels of these molecules must be tightly regulated for our system to behave correctly in a predictable fashion. Plasmid dynamics have been studied exhaustively over these years using mathematical models: from how plasmid growth is regulated to plasmid partitioning after cell division. However, little is known about how two identical plasmids interact each other.
In this part of our modelling project aims to shed light upon this issue using previous models of ColE1 plasmids and the properties when two different plasmids with same origin of replication are introduced in a cell: Do they behave independently, that is, in an orthogonal way? To address this question, we employed both deterministic and stochastic models to create a theoretical framework on this field. Also, we developed some approximations to discover some important features of the system.

Deterministic model

This model is taken from (Paulsson, 2001). Here, the plasmid levels are regulated both by a RNA molecule that inhibits plasmid replication and the dilution of plasmids as a result of cell growth and division>:

$$ \frac{d P}{dt} = r \cdot P \cdot R(s) - \frac{\log 2}{\tau} \cdot P $$ $$ \frac{d s}{dt} = \alpha \cdot P - \beta \cdot s $$ where:
$P$ is the number of plasmids
$s$ is the number of RNA inhibitors
$r$ is the plasmid growth rate
$\tau$ is the cell growth rate
$\alpha$ is the RNA inhibitor synthesis rate
$\beta$ is the RNA inhibitor decay rate
$R(s)$ is the repression effect of RNA inhibitor: $$ R(s) = \frac{1}{(1+\frac{s}{s_0})^n} $$ where $s_0$ is the value of RNA inhibitor to repress half of the replication events and $n$ is the cooperativity

Plasmids levels are regulated both by a negative feedback and cell growth. In order to take into account the presence of several plasmids, it has been modified to obtain this set of equations:

$$ \frac{d P_i}{dt} = r_i \cdot P_i \cdot R(s) - \frac{\log 2}{\tau} \cdot P_i $$ $$ \frac{d s}{dt} = \alpha \cdot \sum \limits_{i=1}^n{P_i} - \beta \cdot s $$

The steady state solution is then:

$$ s_{ss} = R^{-1}(\frac{\log 2}{\tau \cdot r_i})$$ $$ \sum \limits_{i=1}^n{P_{i,ss}} = (\frac{ \beta \cdot s_{ss}}{\alpha})$$
In a simplified way, if we had two identical plasmids: $$ P_{1,ss} + P_{2,ss} = K $$

It is noticeable that plasmid levels are not independent and has a determined value: they depend on each other. However, the value for a system is not known directly using this formula. A hypothesis to explain this solution is that small perturbations can increase the relative level of one plasmid, favouring its growth. As they are repressed as a function of the total number of plasmids but they growth depending on their number, the system is prone to enhance one over another. $$ \frac{d P_1}{d P_2} = \frac{r_1 \cdot P_1}{r_2 \cdot P_2} $$
If both growth rate are constant, both plasmids are going to growth only depending of its level, then, the slope could be calculated using the initial conditions:
$$ m = \frac{P_{01}}{P_{02}} $$

Quasi-steady state approach

To simplify this set of equations, inhibitor dynamics are thought to be faster than plasmid growth, so RNA levels are a direct representation of plasmid levels.

$$ \frac{d P_i}{dt} = r_i \cdot P_i \cdot R(P_i) - \frac{\log 2}{\tau} \cdot P_i $$

Dependence on initial conditions:

To test the previous hypothesis, we integrated the set of two equations using numerical methods as we changed the initial number of each plasmid. As expected, small differences in proportions were amplified.

Stochastic model

The previous model is suitable when the system is not noisy. However, this is not the case and small variations in plasmid levels could be highly amplified. Hence, we modeled the effect of noise in plasmid growth and how it affects. In order to obtain we used the Stochastic Simulation Algorithm (SSA) developed by Gillespie (Erbran et al, 2007 is a good paper about the topic). In order not to enter into details and giving a general overview of the process, SSA simulates by iterations which processes of a given set of reactions occur at which time using a random number generator.

First of all, we developed a code using this algorithm, considering the growth of two plasmids and cell division as independent process. The steps are the following:

Calculation of probabilities: $$ \alpha_{P_i \rightarrow P_i+1} = \frac{1}{r_i \cdot P_i \cdot R(P_i)}$$

Calculation of the timestep when the next reaction occurs: $$ t = \sum \limits_{i=1}^{n}{\alpha_{P_i \rightarrow P_i+1}} \cdot \log(\frac{1}{\theta_1})$$ where $\theta_1$ is a random number from a uniform distribution U(0,1).

Calculation the next reaction. The next reaction taking place will be the one which: $$ \sum \limits_{i=1}^{j}{\alpha_{P_i \rightarrow P_i+1}} < \theta_2 \leq \sum \limits_{i=1}^{j+1}{\alpha_{P_i \rightarrow P_i+1}} $$

Plasmid partition after cell division at t = $\tau$

Here is a couple of examples of what can be obtained for two different conditions: the first one takes places when $P_1=P_2=10$ and the second one when $P_1 = 10$ and $P_2 = 50$

As expected, the deterministic model gives a mean version of the dynamics. However these are not the single cell dynamics, which show oscillatory behaviour. Although the levels of each plasmid can vary, their sum remains unchaged (black line)

A lineage model

Aiming to obtain a better description of our system and perform a more exhaustive analysis, we developed a modeling approach to study the plasmids level in a lineage from a single cell.
As tracking bacteria and their plasmid levels is an important issue, we implemented a tree plot in our code using a matlab package
Partition of plasmids...

Also we are able to track the plasmids levels of each bacteria at the time of their division.

The distribution will be the following for the begining and for the ending of the simulation:

Analysis of plasmids levels...

A step beyond: Resistance

Also we are able to track the plasmids levels of each bacteria at the time of their division.

Approaches for less computational expensive

Although these simulations are powerful and helpful and yield informative results, they are computationally expensive. This could become an important pitfall when dealing with a huge number of cells or long times. We could bypass the problem using an approach if we are working with populations as explained in (Paulsson, 2001). In order to do so, we added a degradation process for each plasmids following this equation:
$$ \alpha_{P_i \rightarrow P_i-1} = \frac{log(2)}{\tau} \cdot P_i$$ In this new system we are observing the global dynamics for a cell population. If there is no resistance, we could obtain the following dynamics, where a plasmid is lost:

This new approach could lead to study cell dynamics in a population.

Conclusions

In this work we aimed to bring the attention to a relevant issue in synthetic biology: plasmid dynamics. Yet they are widely used, we lack of great knowledge about the interaction between two plasmids in the same cell. Plasmids are used because they can perform the tasks we want. However, are their behaviour is the expected one? A relevant example is Marguet et al., 2010 where they obtained an emergent behaviour as result of an unexpected relation between the bacterial physiology and the plasmid dynamics.
In first place, we look for studies with two plasmids in a cell but we were unable to find, although there are interesting work in single plasmid dynamics. Using these models we switch between deterministic and stochastic models to learn about the unspecific interaction between two plasmids in a cell. Therefore, we have developed a new framework to work on this topic. Moreover, we have studied this trying to shed light on this issue.
As we could have learnt, we found that this system is unstable and the only way to keep the plasmid levels is introducing a resistance. However, it is not a good way to maintain those levels: bacteria which have non-desired levels of plasmids are killed. This leads to a lower yield. Also, there is an important variability, which can drive the system away from the expected behaviour. Hence, new...

Open License

References

Ghosh K., Dill K. (2010). Cellular proteomes have broad distributions of protein stability. Biophys J 99: 3996–4002.
Soini, J. et al. (2005) Transient increase of ATP as a response to temperature up-shift in Escherichia coli. Microb. Cell Fact. 4, 9.
Brewster R.C., Jones D.L., Phillips R.(2012) Tuning promoter strength through RNA polymerase binding site design in Escherichia coli. PLoS Comput Biol 8: e1002811.
Skelly S. et al. (1987) Correlation between the 32-kDa s factor levels and in vitro expression of Escherichia coli heat shock genes. Proc. Natl. Acad. Sci. USA 84:8365–8369.
Zaritsky, A. (1982). Effects of growth temperature on ribosomes and other physiological properties of Escherichia coli. J. Bacteriol. 151:485-486.
Bremer, H., Dennis, P. P. (1996) Modulation of chemical composition and other parameters of the cell by growth rate. Neidhardt, et al. eds. Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, 2nd ed. chapter 97.
Piperno J.R., Oxender D.L. (1968) Amino acid transport systems in Escherichia coli K-12. J Biol Chem. 243(22):5914–5920.
Thompson R.C., Dix D.B., Karim A.M. (1986) The reactions of ribosomes with elongation factor Tu·GTP complexes. Aminoacyl-tRNA-independent reactions in the elongation cycle determine the accuracy of protein synthesis. J Biol Chem, 261:4868–4874.
Bilgin N. et al. (1992) Kinetic properties of Escherichia coli ribosomes with altered forms of S12. J Mol Biol. 224 (4):1011-27.
Grigorenko B.L. et al. (2008) Mechanism of the chemical step for the guanosine triphosphate (GTP) hydrolysis catalyzed by elongation factor Tu. Biochim Biophys Acta 1784:1908–1917.
Gindulyte,A. et al. (2006) The transition state for formation of the peptide bond in the ribosome. Proc. Natl. Acad. Sci. U. S. A., 103:13327–13332.
Wilson,K. S., Noller, H.F. (1998) Mapping the position of translational elongation factor EF-G in the ribosome by directed hydroxyl radical probing. Cell, 92: 131–139.
Erban R, Chapman SJ and Maini PK (2007) A PRACTICAL GUIDE TO STOCHASTIC SIMULATIONS OF REACTION-DIFFUSION PROCESSES link
Klumpp E (2011) Growth-Rate Dependence Reveals Design Principles of Plasmid Copy Number Control PLoS One Vol 6 Issue 5 link
Marguet P, Tanouchi Y, Splitz E, Smith C and You L (2010) Oscillations by Minimal Bacterial Suicide Circuits Reveal Hidden Facets of Host-Circuit Phisology PLoS One Vol 5 Issue 7 link
Paulsson J and Ehrenberg M (2001) Noise in a minimal regulatory network: plasmid copy number control Cambridge Press link

Retrieved from "http://2014.igem.org/Team:Valencia_Biocampus/Modeling"

@@ Line 507: / Line 507: @@
          <img src="https://static.igem.org/mediawiki/2014/7/73/VB2014_Det_PhaseMap.jpg" width="340" height="255" alt="Simple Linear Regression Model" class="thumbnail" style="display: inline; float: none;"/>
      </a>
-</div>
-<div class="thumbnail col-sm-9 center-block">
-  <a href="https://static.igem.org/mediawiki/2014/d/d6/VB2014_Det_FinalDistribution.jpg" rel="lightbox">
-    <img src="https://static.igem.org/mediawiki/2014/d/d6/VB2014_Det_FinalDistribution.jpg" width="255" height="192" alt="Figure 1">
-  </a>
-  <a href="https://static.igem.org/mediawiki/2014/7/73/VB2014_Det_PhaseMap.jpg" rel="lightbox">
-        <img src="https://static.igem.org/mediawiki/2014/7/73/VB2014_Det_PhaseMap.jpg" width="255" height="192" alt="Figure 1">
-    </a>
-  <div class="caption">
-    <strong>Figure 1:</strong>
-    Growth of <em>E. coli</em> strains DH5-alpha (left) and XL1-blue (right) under different temperatures (30ºC-50ºC) set in a thermal cycler. The two strains display clearly different temperature profiles.
-  </div>
 </div>
 <br>

Team:Valencia Biocampus/Modeling

From 2014.igem.org

Revision as of 17:31, 13 October 2014

Modeling

Presentation

Protein production

Enzymatic activity

Stability

Dependence on Temperature

Transcription rate

Synthesis rate

Other Stress factors: pH and salt concentration

Conclusion & Results

Standardization

FORMULA HERE

Orthogonality

Why modelling plasmid dynamics?

Deterministic model

Quasi-steady state approach

Dependence on initial conditions:

Stochastic model

A lineage model

A step beyond: Resistance

Approaches for less computational expensive

Conclusions

Open License

References