Team:TU Delft-Leiden/Modeling/Techniques/Deterministic

From 2014.igem.org

Revision as of 13:26, 12 September 2014 by Anton (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Deterministic Modeling Theory

The three main systems that form a part of our iGEM project, the landmine promoter, the formation of (conductive) curli, and the assembly of the Extracellular Electron transport (EET) pathway, all involve various biological mechanisms that are not fully understood yet. To gain more insight in these mechanisms, we will apply the technique of deterministic modeling. Deterministic modeling, as opposed to stochastic modeling, does not involve randomness, and therefore yields “exact” solutions. In this section, a general outline of the strategy used to set up and analyze a deterministic model will be provided.

The first step in deterministic modeling should be to get a clear view of what you want to model. In all our cases, we want to obtain a model that predicts the amount of a certain protein or protein complex formed at a certain time. Once this modeling goal is established, you should get a very good and detailed understanding of the system at hand. What compounds are involved? How do the different compounds react with each other? Is localization and transport an important part of the system? Answering these questions (and many more) will mostly be achieved by doing extensive literature research.

Once a clear overview of the model is established, you should decide which processes are most important. Although it is tempting to describe your system in as much detail as possible, this will make your model cluttered and difficult to work with and analyze. Besides that, every reaction you include introduces at least one extra parameter, such as a reaction rate. To find exact values for those parameters is nigh-impossible, and even making an educated guess is not as easy as it seems. To decide whether a process is important or not, a good strategy is to see if certain steps are described as rate-limiting. Rate-limiting steps are steps that take a lot of time and therefore have a big influence on the time behavior of your system. Another class of important processes are processes in which new compounds are formed. However, not every reaction step in such a process should be described as a separate reaction. For example if you have A turns to B, and B turns to C, and C turns to D (A -> B -> C -> D), this can be summarized as A -> D immediately. The reaction rate of this summarized reaction can be estimated to be the lowest rate in the process (the rate-limiting step).

Once you have decided which processes are important enough to include in your model, you should write down reaction equations for the processes. An generic example of a reaction equation looks like this:

$A + B ->k-> C$ (1)

This reaction equation tells us that compound A and B react at a reaction rate k to form compound C. Writing down such equations from the information you found during research is the most important step of modeling. This is where you write down what you think is an accurate description of how the system works in real life.

After establishing a system of reaction equations, the next step is to convert this system to a system of coupled Ordinary Differential Equations (ODEs). ODEs are a broad class of differential equations, which have in common that they contain a function of one dependent unknown variable and its derivatives. In the case of the deterministic modeling of biological systems, this means that for every compound (the dependent variable), you write down how its concentration changes in time (the time derivative of the dependent variable). To make this more clear, we will write down the system of ODEs describing reaction (1) step-by-step.

We will first consider the concentration of the compound A ([A]) as the dependent variable. A reacts with B to form C; this means A will be removed from the system due to this reaction. To have this reaction occurring, at least one A and one B is needed. The rate at which this reaction occurs is therefore proportional to both the concentration of A and B, and of course the reaction rate k. Written down as an ODE, you have this:

$d/dt[A]= -k[A][B]$

For compound B, exactly the same happens, so the differential equation is similar:

$d/dt[B]= -k[A][B]$

For the increase of [C], it can easily be seen that this equals the decrease of A (or B), and therefore the ODE will read:

$d/dt[C]= k[A][B]$

The system of ODEs we have arrived at is of course extremely simple. In real deterministic modeling of biological systems, usually a lot more compounds are involved which might react in different ways, yielding more and longer ODEs. It is important to keep track of all compounds and to make sure that you have a closed system.

When you have written down a system of ODEs, there are a couple of different strategies you can pursue. Perhaps the easiest is to make a Matlab script containing your system of ODEs and using the function ode45. This solves the system in an iterative way, and will give you the concentration of all compounds as a function of time. Although this is easy, it can be quite computationally intensive. A more elegant way, which can be applied to small systems, is to solve the system by hand, to obtain a closed form solution. Depending on your system and your math skill, this can be easy, hard, or impossible. A third option would be to search for steady state solutions. Steady state means that the system does not change anymore, i.e. all time derivatives are zero. Finding steady state solutions is rather easy; however, it is not possible to find a meaningful steady-state solution for a lot of systems. The generic system described above has a steady state solution, but the only thing it will tell you is that nothing happens when [A] or [B] is zero.

Once you have found a solution to your system, you would like it to show (approximately) the same results as found during the lab work or in literature. To realize this, you need to choose your unknown parameters in such a way that your modeling results match the data. This is called fitting. When you have found an analytical solution to your system, you can use a range of Matlab functions to do this, such as nlinfit. If your system is more complex, or if you want to fit data depending on something else than the independent variable (time in most cases), this will usually not work. Although this sounds quite non-scientific, the best approach in such a case is to guess your parameters and adjust them in a iterative fashion until you have found a fit that matches your data. Following this approach, you will most probably not be able to determine the exact value of a parameter, but obtaining the order of magnitude of a parameter or the ratio between different parameters will nevertheless give you valuable insight in the system.

Deterministic Modelling of the Landmine Promoters

An important part of our iGEM project is a promoter sensitive to landmines, first described by Yagur-Kroll et. al. [1] Two of the promoters described in aforementioned paper, ybiJ and ybiFB2A1, will be used in our project. Of these promoters, not much is known other than the fact that they have a DNT/TNT-dependent response curve (see figure 1) Our goal was to find a model which would be able to represent the response curves of both promoters.

[add picture here]

Our first approach was to solve a system of Ordinary Differential Equations (ODEs) resembling the transcription and translation of a gene activated by the DNT-sensitive promoter. The ODEs were derived from the following system of reactions:

P_R+ DNT <-k_off k_on-> P_A ) (1)

P_A ->s_A-> P_A+mRNA (2)

P_R ->s_R-> P_R+mRNA (3)

mRNA ->s_P-> mRNA+R (4)

mRNA ->d_m-> 0 (5)

R ->d_R-> 0 (6)

In these reaction equations, P_R and P_A indicate respectively repressed and active promoters. DNT indicates DNT molecules, mRNA stands for an mRNA molecule transcribed from the gene behind the promoter and R is the reporter protein which is translated from the mRNA. kon and koff are the rates at which a promoter goes from the repressed to the active state and vice versa. s_A is the transcription rate from an active promoter. s_R is the transcription rate from an repressed promoter; this is also referred to as leakage. s_P is the translation rate. d_M and d_P are the mRNA and protein degradation rates.

Reaction (1) describes the activation of a promoter in the presence of DNT. It is assumed here that the activation mechanism can be described as the binding of the DNT to the repressed promoter, with the resulting complex being an active promoter. Reaction (2) describes transcription. Reaction (3) described transcription through leakage, i.e. transcription from a repressed promoter. Reaction (4) describes translation. Reactions (5) and (6) describe mRNA and reporter protein degradation respectively.

This system of reactions leads to the following system of ODEs:

d/dt[P_R]= -k_on[P_R ][DNT]+ k_off[P_A ] (7.1)

d/dt [P_A ]= k_on [P_R ][DNT]- k_off [P_A ] (7.2)

d/dt[DMT] = -k_on [P_R][DNT] + k_off [P_A ](7.3)

d/dt [mRNA]= s_A [P_A ]+ s_R [P_R ]- d_m [mRNA](7.4)

d/dt[R] = s_P [mRNA] - d_P [R](7.5)

A system of ordinary differential equations like this can be solved in MATLAB making use of the function ode45. The result of this will be a set of curves describing the concentration of each compound in time. However, when choosing parameters with an realistic order of magnitude, the time needed to reach steady state, i.e. when the reporter protein concentration does not change anymore, is so long that the script turns extremely computationally intensive. Because time-dependence is not relevant when imitating the dose-response curves from the Yagur-Kroll paper [1], we decided to discard the ODE method and switch to analytic steady state methods.

The goal of this steady state approach is to find a relation between the concentration of reporter protein ([R]) and the amount of DNT ([DNT]), with as little parameters as possible. To obtain such a relation, we started from equation (7.2), which describes the change in time of the concentration of activated promoters. We used a steady state assumption to simplify this relationship, which means that we assumed that the concentration of active promoters does not change in time, i.e d/dt [P_A] = 0. This assumption can be justified by examining figure 2a in [1], from which it can be clearly seen that after a certain amount of time, the reporter protein elvel reaches a constant value. This yields the following:

[P_A ]=k_on/k_off [P_R ][DNT]=1/K_d [P_R ][DNT] (8)

Here we used that K_d=k_off/k_on, which is called the dissociation constant. A next assumption we make is that the total amount of promoter ([PT]) is constant. This makes sense, since an active promoter can become a repressed promoter and vice versa, but it is not possible to make new promoters or let promoters vanish. We overlook the fact that during cell replication, the promoter level will rise due to DNA replication. However, we assume that this is a minor contribution. This assumption yields the following:

[P_T]=[P_A]+ [P_R] --> [P_R]=[P_T]- [P_A] (9)

If we plug equation (9) in equation (8), we obtain the following:

[P_A]=1/K_d ([P_T]-[P_A])[DNT] (10)

This can be rewritten to the following form:

[P_A]/[P_T] =[DNT]/(K_D+[DNT]) (11)

This equation represents the fraction of active promoters from the total amount of promoters.

The change of concentration of reporter protein ([R]) can be given by the following ODE:

d/dt [R]=[P_A]/[P_T] k_1+[P_R]/[P_T] k_2-d_p [R] (12)

In this equation, k_1 is the rate of protein production from an active promoter and k_2 is the rate of protein production from a repressed promoter. Using this equation, the transcription/translation-process is wrapped up in one equation instead of two as in equation (7.4) and (7.5). This can be justified by the fact that [verzin iets]. Plugging in equation (11) and again making the steady state assumption at certain moment the concentration of reporter protein does not change (d/dt [R]=0), we arrive at the following expression for the equilibrium concentration of the reporter protein:

[R]_eq =1/d_p (k_1 [DNT]+K_D k_2)/(K_d+[DNT] ) (13)

For the protein degradation rate we used the fixed value of 4 x 10-6, which is estimated from the data in [2]. To obtain the values for Kd ,k1 and k2, we fitted equation (13) to the data from the Yagur-Kroll paper [1]. To do this, we used the nlinfit function in MATLAB.

[add picture here]

From these figures, it can be seen that the fits do not match the data. This is a strong indication that the model as described by equation (13) is not correct for the system at hand. We therefore went on to explore another model. Since one-on-one interaction between DNT and the promoter did not yield good results, we hypothesized that this might be a case of cooperative interaction. This means that the promoter activation reaction needs multiple DNT molecules. This is described by equation (14):

P_R + DNT + DNT + ⋯ <-k_off k_on-> P_A ) (14)

This can be rewritten as:

P_R+ n x DNT <-k_off k_on-> P_A ) (15)

In this relation, n is the amount of DNT molecules that react with a repressed promoter to form an activated promoter. This parameter is also called the Hill coefficient. This reaction yields the following ODE for the concentration of active promoters:

d/dt[P_A]= k_on [P_R][DNT]^n -k_off [P_A] (16)

This equation is almost completely similar to equation (7.2), except for the Hill coefficient appearing as an exponent. The analysis needed to find a equation relating the equilibrium concentration of the reporter protein to the amount of DNT is also completely similar to the analysis done to obtain equation (13). Therefore, we will suffice with giving the end result:

[R]_eq= 1/d_p (k_1 [DNT]^n+k_2 K_d^n)/(K_d^n+[DNT]^n ) (17)

Fitting this equation (again with the fixed value of 4 x 10-6 for protein degradation) to the data in the Yagur-Kroll paper yields the following parameter values:

Table 1: Parameters fitted with the model described in equation (17)

	jbiJ promoter	yqjFB2A1 promoter
K_d	2.08 mM	0.501 mM
n	3.19 mM	2.36 mM
K_1	1.31 pM mM	7.30 pM
K_2	0.13 fM	03.85 fM

Displaying these fits graphically yields the following figure:

[add picture]

From figure 3, it can be seen that the model described by equation (17) fits remarkably well to the experimental data. It is therefore worthwhile to look into the found parameter values a bit more. The lower dissociation constant for the yqjFB2A1 promoter compared to the jbiJ promoter indicates that the DNT binds better to the yqjFB2A1 promoter. This is not surprising, since the yqjFB2A1 promoter is the result of a directed evolution experiment in which is DNT response was enhanced with respect to the original yqjF promoter, which had similar characteristics as the JbiJ promoter [1]. This is also the most obvious explanation of the lower values for the Hill coefficient. A large increase in both k1 and k2 can be seen between the jbiJ promoter and the yqjFB2A1 promoter. This indicates that the reporter protein production does not only increase for the activated promoter, which is as expected, but also for the repressed promoter. This corresponds to the fact that in [1], for the yqjFB2A1 promoter, a higher background luminescence was recorded than for the non-mutated promoters (see figure 4b in [1]).

Since we plan to use the found values of k1 in other parts of the modelling, a quick back-of-the-envelope calculation was performed to check if the order of magnitude of k1 is realistic. In [1], a OD600 of 0.2 is recorded. This corresponds to a cell concentration of 1.6 x 108 per mL or approximately 2.7 x 10-13 M. With a value for k1 of 7.3 x 10-12 M, this corresponds to a protein production of approximately 27 proteins per cell per second, which seems fairly reasonable.

Although the model described by equation (17) is able to match the experimental data in [1] quite well, assuming that the simple direct cooperative binding of DNT to the promoter is the appropriate mechanism, is too short-sighted. It is already hypothesized by Jagur-Kroll et. al. that ‘the induction of the yqjF and ybiJ promoters is probably not caused by our target compounds but rather by their metabolites or degradation products, possibly a quinol or quinol derivative.’ To obtain a more suitable model description of the landmine promoters, further research in their activation mechanism is needed. However, this is beyond the scope of iGEM and therefore we will suffice with the cooperative binding model described by equation (17).

references

[1] S. Yagur-Kroll, S. Belkin et. al. “Escherichia Coli bioreporters for the detection of 2,4-dinitrotoluene and 2,4,6-trinitrotoluene.” Appl Microbiol Biotechnol, 98, 885-895, 2014

[2] M.R. Maurizi. “Proteases and protein degradation in Escherichia coli”. Experientia, 48(2), . 178-201. p.181, 1992, table 2