Team:TU Delft-Leiden/Modeling/Techniques
From 2014.igem.org
Line 18: | Line 18: | ||
<html> | <html> | ||
<body> | <body> | ||
- | |||
<a name="DeterministicTheory"></a> | <a name="DeterministicTheory"></a> | ||
<h3> Deterministic Modeling Theory </h3> | <h3> Deterministic Modeling Theory </h3> | ||
+ | |||
+ | <p> | ||
+ | The three main systems that form a part of our iGEM project, the landmine promoter, the formation of (conductive) curli, and the assembly of the Extracellular Electron transport (EET) pathway, all involve various biological mechanisms that are not fully understood yet. To gain more insight in these mechanisms, we will apply the approach of deterministic modeling. Deterministic modeling, as opposed to stochastic modeling, does not involve randomness, and therefore yields “exact” solutions. In this section, we present a general outline of the strategy used to set up and analyze a deterministic model. | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | The first step in deterministic modeling (and in every kind of modeling) should be to get a clear view of what you want to model. In all our cases, we want to obtain a model that predicts the amount of a certain protein or protein complex formed at a certain time. Once this modeling goal is established, you should get a very good and detailed understanding of the system at hand. What compounds are involved? How do the different compounds react with each other? Is localization and transport an important part of the system? Answering these questions (and many more) will mostly be achieved by doing extensive literature research. | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | Once a clear overview of the model is established, you should decide which processes are most important. Although it is tempting to describe your system in as much detail as possible, this will make your model cluttered and difficult to work with and analyze. Besides that, every reaction you include introduces at least one extra parameter, such as a reaction rate. To find exact values for those parameters is nigh-impossible, and even making an educated guess is not as easy as it seems. To decide whether a process is important or not, a good strategy is to see if certain steps are described as rate-limiting. <i> Rate-limiting steps </i> are steps that take a lot of time and therefore have a big influence on the time behavior of your system. Another class of important processes are processes in which new compounds are formed. However, not every reaction step in such a process should be described as a separate reaction, since this will result in too many equations to conveniently handle. For example if you have A turns to B, and B turns to C, and C turns to D (\(A \rightarrow B \rightarrow C \rightarrow D\)), this can be summarized as \(A \rightarrow D\) immediately. The reaction rate of this summarized reaction can be estimated to be the lowest rate in the process (the rate-limiting step). | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | Once you have decided which processes are important enough to include in your model, you should write down reaction equations for the processes. An generic example of a reaction equation looks like this: | ||
+ | |||
+ | $$ A + B \ \xrightarrow{k} \ C \tag{1} $$ | ||
+ | |||
+ | This reaction equation tells us that compound A and B react at a reaction rate k to form compound C. Writing down such equations from the information you found during research is the most important step of modeling. This is where you write down what you think is an accurate description of how the system works in real life. </p> | ||
+ | <br> | ||
+ | <p>After establishing a system of reaction equations, the next step is to convert this system to a system of coupled Ordinary Differential Equations (ODEs). ODEs are a broad class of differential equations, which have in common that they contain a function of one dependent unknown variable and its derivatives. In the case of the deterministic modeling of biological systems, this means that for every compound (the dependent variable), you write down how its concentration changes in time (the time derivative of the dependent variable). To make this more clear, we will write down the system of ODEs describing reaction (1) step-by-step.</p> | ||
+ | <br> | ||
+ | <p>We will first consider the concentration of the compound A ([A]) as the dependent variable. A reacts with B to form C; this means A will be removed from the system due to this reaction. To have this reaction occurring, at least one A and one B is needed. The rate at which this reaction occurs is proportional to both the concentration of A and B, and of course the reaction rate k. The reaction rate is a parameter which defines how fast or slow a reaction happens. Written down as an ODE, you have this: | ||
+ | |||
+ | $$ \frac{d}{dt} [A] = \ -k[A][B] $$ | ||
+ | |||
+ | For compound B, exactly the same happens, so the differential equation is similar: | ||
+ | |||
+ | $$ \frac{d}{dt} [B] = \ -k[A][B] $$ | ||
+ | |||
+ | For the increase of [C], it can easily be seen that this equals the decrease of A (or B), and therefore the ODE will read: | ||
+ | |||
+ | $$ \frac{d}{dt} [C] = \ k[A][B] $$ | ||
+ | |||
+ | The system of ODEs we have arrived at is of course extremely simple. In real deterministic modeling of biological systems, usually a lot more compounds are involved which might react in different ways, yielding more and longer ODEs. It is important to keep track of all compounds and to make sure that you have a closed system. </p> | ||
+ | <br> | ||
+ | <p>When you have written down a system of ODEs, there are a couple of different strategies you can pursue. Perhaps the easiest is to make a Matlab script containing your system of ODEs and using the function <i> ode45 </i>. This solves the system in an iterative way, and will give you the concentration of all compounds as a function of time. Although this is easy, it can be quite computationally intensive. A more elegant way, which can be applied to small systems, is to solve the system by hand, to obtain a closed form solution. Depending on your system and your math skill, this can be easy, hard, or impossible. A third option would be to search for steady state solutions. Steady state means that the system does not change anymore, i.e. all time derivatives are zero. Finding steady state solutions is usually not very problematic; however, it is not possible to find a meaningful steady-state solution for a lot of systems. The generic system described above has a steady state solution, but the only thing it will tell you is that nothing happens when [A] or [B] is zero. | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | Once you have found a solution to your system, you would like it to show (approximately) the same results as found during the lab work or in literature. To realize this, you need to choose your unknown parameters in such a way that your modeling results match the data. This is called fitting. When you have found an analytical solution to your system, you can use a range of Matlab functions to do this, such as <i> nlinfit </i>. If your system is more complex, or if you want to fit data depending on something else than the independent variable (time in most cases), this will usually not work. <br> | ||
+ | Although this sounds quite non-scientific, the best approach in such a case is to guess your parameters and adjust them in a iterative fashion until you have found a fit that matches your data on eye. Following this approach, you will most probably not be able to determine the exact value of a parameter, but obtaining the order of magnitude of a parameter or the ratio between different parameters will nevertheless give you valuable insight in the system. | ||
+ | </p> | ||
<a name="FBATheory"></a> | <a name="FBATheory"></a> | ||
<h3> Flux Balance Analysis Theory</h3> | <h3> Flux Balance Analysis Theory</h3> | ||
+ | |||
+ | <p> | ||
+ | Flux Balance Analysis (FBA) is a method that calculates the fluxes of metabolites through a metabolic network. In order to perform the FBA method, a model of the metabolic network is needed. These models are constructed based on experiments where reactions that occur in a cell are identified. Subsequently, the fluxes through a reaction can be given constraints. However, in practice this is not done very often, as it is difficult to obtain these constraints experimentally. We used two different models, the core model of E. coli, which consists of all the essential reactions of the metabolism and an extended model, the iJO1366 model, which contains 30 times more reactions and 25 times more metabolites [1]. | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | Firstly, all the metabolic reactions of the model are mathematically represented in an m by n matrix, called the stoichiometric matrix (<b>S</b>). For instance, if we analyse a metabolic network that consists of the following two reactions: | ||
+ | |||
+ | $$ A + 2 \ C \ \rightarrow \ D $$ | ||
+ | |||
+ | $$ 3 \ B \ \rightarrow \ 4 \ A $$ | ||
+ | |||
+ | We get the following stoichiometric matrix: | ||
+ | |||
+ | $$ | ||
+ | \boldsymbol{S} \ = | ||
+ | \begin{bmatrix} | ||
+ | -1 && 4 \\ | ||
+ | 0 && -3 \\ | ||
+ | -2 && 0 \\ | ||
+ | 1 && 0 | ||
+ | \end{bmatrix} | ||
+ | $$ | ||
+ | |||
+ | So, each row represents one unique compound and each column represents one unique reaction. The values in the stoichiometric matrix are called the stoichiometric coefficients. These coefficients indicate which metabolites are involved in a specific reaction, where the number represents how many molecules of the metabolite are involved in this specific reaction and it is negative when the metabolite is consumed and positive when the metabolite is produced. The stoichiometric coefficient is zero for every metabolite that is not involved in a particular reaction. <br> | ||
+ | Secondly, a vector <b>v</b> is defined which gives the fluxes through all reactions part of the model. Thus, it has a length of n, as there are n reactions. <br> | ||
+ | Thirdly, a vector <b>x </b> is defined which gives the concentrations of all the metabolites of the system, so it has length m. | ||
+ | |||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | The basic assumption of FBA is that the system is at steady-state, so that \(\frac{d\boldsymbol{x}}{dt}=\ 0\). Subsequently, the following system of equations is solved: | ||
+ | |||
+ | $$ 0 \ = \ \frac{d\boldsymbol{x}}{dt} = \ \boldsymbol{S}\boldsymbol{v} \tag{1}$$ | ||
+ | |||
+ | Because there will be more reactions than metabolites in any realistic large-scale metabolic model, there is no unique solution to this system of equations (there are more unknown variables than equations). The set of possible solutions to the system of equations (1) is called the solution space. We can make the solution space smaller by imposing constraints on the system. For instance, we can constrain the maximum or minimum allowable flux of a certain reaction. However, generally the solution space consists of multiple solutions. <br> | ||
+ | To obtain a solution, the FBA method maximizes or minimizes an objective function Z, which is defined by the user. The objective function can be any linear combination of fluxes. The resulting system of equations with constraints and an objective function is optimized by a linear programming algorithm, and a solution is obtained. Still, the solution space can consist of multiple solutions. In this case, the linear programming algorithm will choose one particular solution. The multiple solutions can be explored by using Flux Variability Analysis (FVA), as we will do for our project. [1] | ||
+ | </p> | ||
+ | |||
+ | <br> | ||
+ | |||
+ | <p> | ||
+ | The fact that the solution is at steady state means that the FBA method is not suitable for investigating the changing behaviour of a system over time. However, it is very useful for obtaining insight in often very complex metabolic networks. For our project, we will apply the FBA method to the extracellular electron transport (EET) module <font color="red">[reference theory]</font>. Our goal is to gain insight in the carbon metabolism providing the electrons for our EET module. The results of this analysis can be found in <a href="https://2014.igem.org/Team:TU_Delft-Leiden/Modeling/EET/FBA">Flux Balance Analysis</a>. | ||
+ | </p> | ||
<a name="PercolationTheory"></a> | <a name="PercolationTheory"></a> | ||
Line 31: | Line 129: | ||
<a name="GraphTheory"></a> | <a name="GraphTheory"></a> | ||
<h3> Graph Theory </h3> | <h3> Graph Theory </h3> | ||
- | |||
</div> | </div> |
Revision as of 20:02, 27 September 2014
Modeling Techniques
Contents
Deterministic Modeling Theory
The three main systems that form a part of our iGEM project, the landmine promoter, the formation of (conductive) curli, and the assembly of the Extracellular Electron transport (EET) pathway, all involve various biological mechanisms that are not fully understood yet. To gain more insight in these mechanisms, we will apply the approach of deterministic modeling. Deterministic modeling, as opposed to stochastic modeling, does not involve randomness, and therefore yields “exact” solutions. In this section, we present a general outline of the strategy used to set up and analyze a deterministic model.
The first step in deterministic modeling (and in every kind of modeling) should be to get a clear view of what you want to model. In all our cases, we want to obtain a model that predicts the amount of a certain protein or protein complex formed at a certain time. Once this modeling goal is established, you should get a very good and detailed understanding of the system at hand. What compounds are involved? How do the different compounds react with each other? Is localization and transport an important part of the system? Answering these questions (and many more) will mostly be achieved by doing extensive literature research.
Once a clear overview of the model is established, you should decide which processes are most important. Although it is tempting to describe your system in as much detail as possible, this will make your model cluttered and difficult to work with and analyze. Besides that, every reaction you include introduces at least one extra parameter, such as a reaction rate. To find exact values for those parameters is nigh-impossible, and even making an educated guess is not as easy as it seems. To decide whether a process is important or not, a good strategy is to see if certain steps are described as rate-limiting. Rate-limiting steps are steps that take a lot of time and therefore have a big influence on the time behavior of your system. Another class of important processes are processes in which new compounds are formed. However, not every reaction step in such a process should be described as a separate reaction, since this will result in too many equations to conveniently handle. For example if you have A turns to B, and B turns to C, and C turns to D (\(A \rightarrow B \rightarrow C \rightarrow D\)), this can be summarized as \(A \rightarrow D\) immediately. The reaction rate of this summarized reaction can be estimated to be the lowest rate in the process (the rate-limiting step).
Once you have decided which processes are important enough to include in your model, you should write down reaction equations for the processes. An generic example of a reaction equation looks like this: $$ A + B \ \xrightarrow{k} \ C \tag{1} $$ This reaction equation tells us that compound A and B react at a reaction rate k to form compound C. Writing down such equations from the information you found during research is the most important step of modeling. This is where you write down what you think is an accurate description of how the system works in real life.
After establishing a system of reaction equations, the next step is to convert this system to a system of coupled Ordinary Differential Equations (ODEs). ODEs are a broad class of differential equations, which have in common that they contain a function of one dependent unknown variable and its derivatives. In the case of the deterministic modeling of biological systems, this means that for every compound (the dependent variable), you write down how its concentration changes in time (the time derivative of the dependent variable). To make this more clear, we will write down the system of ODEs describing reaction (1) step-by-step.
We will first consider the concentration of the compound A ([A]) as the dependent variable. A reacts with B to form C; this means A will be removed from the system due to this reaction. To have this reaction occurring, at least one A and one B is needed. The rate at which this reaction occurs is proportional to both the concentration of A and B, and of course the reaction rate k. The reaction rate is a parameter which defines how fast or slow a reaction happens. Written down as an ODE, you have this: $$ \frac{d}{dt} [A] = \ -k[A][B] $$ For compound B, exactly the same happens, so the differential equation is similar: $$ \frac{d}{dt} [B] = \ -k[A][B] $$ For the increase of [C], it can easily be seen that this equals the decrease of A (or B), and therefore the ODE will read: $$ \frac{d}{dt} [C] = \ k[A][B] $$ The system of ODEs we have arrived at is of course extremely simple. In real deterministic modeling of biological systems, usually a lot more compounds are involved which might react in different ways, yielding more and longer ODEs. It is important to keep track of all compounds and to make sure that you have a closed system.
When you have written down a system of ODEs, there are a couple of different strategies you can pursue. Perhaps the easiest is to make a Matlab script containing your system of ODEs and using the function ode45 . This solves the system in an iterative way, and will give you the concentration of all compounds as a function of time. Although this is easy, it can be quite computationally intensive. A more elegant way, which can be applied to small systems, is to solve the system by hand, to obtain a closed form solution. Depending on your system and your math skill, this can be easy, hard, or impossible. A third option would be to search for steady state solutions. Steady state means that the system does not change anymore, i.e. all time derivatives are zero. Finding steady state solutions is usually not very problematic; however, it is not possible to find a meaningful steady-state solution for a lot of systems. The generic system described above has a steady state solution, but the only thing it will tell you is that nothing happens when [A] or [B] is zero.
Once you have found a solution to your system, you would like it to show (approximately) the same results as found during the lab work or in literature. To realize this, you need to choose your unknown parameters in such a way that your modeling results match the data. This is called fitting. When you have found an analytical solution to your system, you can use a range of Matlab functions to do this, such as nlinfit . If your system is more complex, or if you want to fit data depending on something else than the independent variable (time in most cases), this will usually not work.
Although this sounds quite non-scientific, the best approach in such a case is to guess your parameters and adjust them in a iterative fashion until you have found a fit that matches your data on eye. Following this approach, you will most probably not be able to determine the exact value of a parameter, but obtaining the order of magnitude of a parameter or the ratio between different parameters will nevertheless give you valuable insight in the system.
Flux Balance Analysis Theory
Flux Balance Analysis (FBA) is a method that calculates the fluxes of metabolites through a metabolic network. In order to perform the FBA method, a model of the metabolic network is needed. These models are constructed based on experiments where reactions that occur in a cell are identified. Subsequently, the fluxes through a reaction can be given constraints. However, in practice this is not done very often, as it is difficult to obtain these constraints experimentally. We used two different models, the core model of E. coli, which consists of all the essential reactions of the metabolism and an extended model, the iJO1366 model, which contains 30 times more reactions and 25 times more metabolites [1].
Firstly, all the metabolic reactions of the model are mathematically represented in an m by n matrix, called the stoichiometric matrix (S). For instance, if we analyse a metabolic network that consists of the following two reactions:
$$ A + 2 \ C \ \rightarrow \ D $$
$$ 3 \ B \ \rightarrow \ 4 \ A $$
We get the following stoichiometric matrix:
$$
\boldsymbol{S} \ =
\begin{bmatrix}
-1 && 4 \\
0 && -3 \\
-2 && 0 \\
1 && 0
\end{bmatrix}
$$
So, each row represents one unique compound and each column represents one unique reaction. The values in the stoichiometric matrix are called the stoichiometric coefficients. These coefficients indicate which metabolites are involved in a specific reaction, where the number represents how many molecules of the metabolite are involved in this specific reaction and it is negative when the metabolite is consumed and positive when the metabolite is produced. The stoichiometric coefficient is zero for every metabolite that is not involved in a particular reaction.
Secondly, a vector v is defined which gives the fluxes through all reactions part of the model. Thus, it has a length of n, as there are n reactions.
Thirdly, a vector x is defined which gives the concentrations of all the metabolites of the system, so it has length m.
The basic assumption of FBA is that the system is at steady-state, so that \(\frac{d\boldsymbol{x}}{dt}=\ 0\). Subsequently, the following system of equations is solved:
$$ 0 \ = \ \frac{d\boldsymbol{x}}{dt} = \ \boldsymbol{S}\boldsymbol{v} \tag{1}$$
Because there will be more reactions than metabolites in any realistic large-scale metabolic model, there is no unique solution to this system of equations (there are more unknown variables than equations). The set of possible solutions to the system of equations (1) is called the solution space. We can make the solution space smaller by imposing constraints on the system. For instance, we can constrain the maximum or minimum allowable flux of a certain reaction. However, generally the solution space consists of multiple solutions.
To obtain a solution, the FBA method maximizes or minimizes an objective function Z, which is defined by the user. The objective function can be any linear combination of fluxes. The resulting system of equations with constraints and an objective function is optimized by a linear programming algorithm, and a solution is obtained. Still, the solution space can consist of multiple solutions. In this case, the linear programming algorithm will choose one particular solution. The multiple solutions can be explored by using Flux Variability Analysis (FVA), as we will do for our project. [1]
The fact that the solution is at steady state means that the FBA method is not suitable for investigating the changing behaviour of a system over time. However, it is very useful for obtaining insight in often very complex metabolic networks. For our project, we will apply the FBA method to the extracellular electron transport (EET) module [reference theory]. Our goal is to gain insight in the carbon metabolism providing the electrons for our EET module. The results of this analysis can be found in Flux Balance Analysis.
Percolation Theory
Graph Theory