Team:TU Delft-Leiden/Modeling/Techniques

From 2014.igem.org

Modeling Methods

There are a lot of different mathematical representations, techniques, theories and algorithms to solve all kinds of different real world problems, e.g. physics, engineering or biology problems. In this section we discuss the different methods we made use of in order to be able to model our three different modules: the landmine detection module, the Extracellular Electron Transport module and the conductive curli module.
We made use of deterministic modeling methods, where you translate reactions that occur in nature to a set of ordinary differential equations and solve this resulting system analytically or numerically. We used Flux Balance Analysis, a method that calculates the fluxes of metabolites through a metabolic network in steady state condition. We also used the percolation theory, which describes when connected clusters will make a connection between any two opposite boundaries of a domain. Lastly, we used graph theory to translate a biological problem to a graph representation and subsequently, we found an algorithm that solved our problem for that graph.

Modeling Methods

Deterministic Modeling Methods

The three main systems that form a part of our iGEM project, the landmine promoter, the formation of (conductive) curli, and the assembly of the Extracellular Electron transport (EET) pathway, all involve various biological mechanisms that are not fully understood yet. To gain more insight in these mechanisms, we will apply the approach of deterministic modeling. Deterministic modeling, as opposed to stochastic modeling, does not involve randomness, and therefore yields “exact” solutions. In this section, we present a general outline of the strategy used to set up and analyze a deterministic model.

The first step in deterministic modeling (and in every kind of modeling) should be to get a clear view of what you want to model. In all our cases, we want to obtain a model that predicts the amount of a certain protein or protein complex formed at a certain time. Once this modeling goal is established, you should get a very good and detailed understanding of the system at hand. What compounds are involved? How do the different compounds react with each other? Is localization and transport an important part of the system? Answering these questions (and many more) will mostly be achieved by doing extensive literature research.

Once a clear overview of the model is established, you should decide which processes are most important. Although it is tempting to describe your system in as much detail as possible, this will make your model cluttered and difficult to work with and analyze. Besides that, every reaction you include introduces at least one extra parameter, such as a reaction rate. To find exact values for those parameters is nigh-impossible, and even making an educated guess is not as easy as it seems. To decide whether a process is important or not, a good strategy is to see if certain steps are described as rate-limiting. Rate-limiting steps are steps that take a lot of time and therefore have a big influence on the time behavior of your system. Another class of important processes are processes in which new compounds are formed. However, not every reaction step in such a process should be described as a separate reaction, since this will result in too many equations to conveniently handle. For example if you have A turns to B, and B turns to C, and C turns to D ($A \rightarrow B \rightarrow C \rightarrow D$), this can be summarized as $A \rightarrow D$ immediately. The reaction rate of this summarized reaction can be estimated to be the lowest rate in the process (the rate-limiting step).

Once you have decided which processes are important enough to include in your model, you should write down reaction equations for the processes. An generic example of a reaction equation looks like this: $$ A + B \ \xrightarrow{k} \ C \tag{1} $$ This reaction equation tells us that compound A and B react at a reaction rate k to form compound C. Writing down such equations from the information you found during research is the most important step of modeling. This is where you write down what you think is an accurate description of how the system works in real life.

After establishing a system of reaction equations, the next step is to convert this system to a system of coupled Ordinary Differential Equations (ODEs). ODEs are a broad class of differential equations, which have in common that they contain a function of one dependent unknown variable and its derivatives. In the case of the deterministic modeling of biological systems, this means that for every compound (the dependent variable), you write down how its concentration changes in time (the time derivative of the dependent variable). To make this more clear, we will write down the system of ODEs describing reaction (1) step-by-step.

We will first consider the concentration of the compound A ([A]) as the dependent variable. A reacts with B to form C; this means A will be removed from the system due to this reaction. To have this reaction occurring, at least one A and one B is needed. The rate at which this reaction occurs is proportional to both the concentration of A and B, and of course the reaction rate k. The reaction rate is a parameter which defines how fast or slow a reaction happens. Written down as an ODE, you have this: $$ \frac{d}{dt} [A] = \ -k[A][B] $$ For compound B, exactly the same happens, so the differential equation is similar: $$ \frac{d}{dt} [B] = \ -k[A][B] $$ For the increase of [C], it can easily be seen that this equals the decrease of A (or B), and therefore the ODE will read: $$ \frac{d}{dt} [C] = \ k[A][B] $$ The system of ODEs we have arrived at is of course extremely simple. In real deterministic modeling of biological systems, usually a lot more compounds are involved which might react in different ways, yielding more and longer ODEs. It is important to keep track of all compounds and to make sure that you have a closed system.

When you have written down a system of ODEs, there are a couple of different strategies you can pursue. Perhaps the easiest is to make a Matlab fucntion containing your system of ODEs and using a script with the function ode45 . The Matlab code to do this for the example system considered here looks like this:

function dy = example(t,C)

%% Reactions
% A + B ->k-> C      % Example reaction

%% ODEs
% d/dt[A] = -k[A][B]
% d/dt[B] = -k[A][B]
% d/dt[C] =  k[A][B]

%% Species
% A = C(1)
% B = C(2)
% C = C(3)

%% Parameters
k  = 1; % example reaction rate

%% System of ODEs
dy=zeros(length(C),1);
dy(1) = -k*C(1)*C(2);
dy(2) = -k*C(1)*C(2);
dy(3) =  k*C(1)*C(2);

end

clear all
close all
clc

%% Initial conditions 
A_0 = 4;
B_0 = 3;
C_0 = 0;
y0= [A_0 B_0 C_0]; 

%% time parameters
tini=0; tend = 3; %Begin and end time
dt = 0.001;
tspan=[tini:dt:tend] ; 

%% ODEs
opt = odeset('AbsTol',1e-8);   %Set tolerance
[t,y]=ode45(@example, tspan, y0, opt);  %Call the solver

This solves the system in an iterative way, and will give you the concentration of all compounds as a function of time. For the example system we consider here, the results of this procedure are shown in figure 1. Although this is easy, it can be quite computationally intensive.

Figure 1: Results obtained for the example system using the Matlab scripts shown above.

A more elegant way, which can be applied to small systems, is to solve the system by hand, to obtain a closed form solution. Depending on your system and your math skill, this can be easy, hard, or impossible. A third option would be to search for steady state solutions. Steady state means that the system does not change anymore, i.e. all time derivatives are zero. Finding steady state solutions is usually not very problematic; however, it is not possible to find a meaningful steady-state solution for a lot of systems. The generic system described above has a steady state solution, but the only thing it will tell you is that nothing happens when [A] or [B] is zero.

Once you have found a solution to your system, you would like it to show (approximately) the same results as found during the lab work or in literature. To realize this, you need to choose your unknown parameters in such a way that your modeling results match the data. This is called fitting. When you have found an analytical solution to your system, you can use a range of Matlab functions to do this, such as nlinfit . If your system is more complex, or if you want to fit data depending on something else than the independent variable (time in most cases), this will usually not work.
Although this sounds quite non-scientific, the best approach in such a case is to guess your parameters and adjust them in a iterative fashion until you have found a fit that matches your data on eye. Following this approach, you will most probably not be able to determine the exact value of a parameter, but obtaining the order of magnitude of a parameter or the ratio between different parameters will nevertheless give you valuable insight in the system.

Flux Balance Analysis Method

Flux Balance Analysis (FBA) is a method that calculates the fluxes of metabolites through a metabolic network. In order to perform the FBA method, a model of the metabolic network is needed. These models are constructed based on experiments where reactions that occur in a cell are identified. Subsequently, the fluxes through a reaction can be given constraints. However, in practice this is not done very often, as it is difficult to obtain these constraints experimentally. We used two different models, the core model of E. coli, which consists of all the essential reactions of the metabolism and an extended model, the iJO1366 model, which contains 30 times more reactions and 25 times more metabolites [1].

Firstly, all the metabolic reactions of the model are mathematically represented in an m by n matrix, called the stoichiometric matrix (S). For instance, if we analyse a metabolic network that consists of the following two reactions: $$ A + 2 \ C \ \rightarrow \ D $$ $$ 3 \ B \ \rightarrow \ 4 \ A $$ We get the following stoichiometric matrix: $$ \boldsymbol{S} \ = \begin{bmatrix} -1 && 4 \\ 0 && -3 \\ -2 && 0 \\ 1 && 0 \end{bmatrix} $$ So, each row represents one unique compound and each column represents one unique reaction. The values in the stoichiometric matrix are called the stoichiometric coefficients. These coefficients indicate which metabolites are involved in a specific reaction, where the number represents how many molecules of the metabolite are involved in this specific reaction and it is negative when the metabolite is consumed and positive when the metabolite is produced. The stoichiometric coefficient is zero for every metabolite that is not involved in a particular reaction.
Secondly, a vector v is defined which gives the fluxes through all reactions part of the model. Thus, it has a length of n, as there are n reactions.
Thirdly, a vector x is defined which gives the concentrations of all the metabolites of the system, so it has length m.

The basic assumption of FBA is that the system is at steady-state, so that $\frac{d\boldsymbol{x}}{dt}=\ 0$. Subsequently, the following system of equations is solved: $$ 0 \ = \ \frac{d\boldsymbol{x}}{dt} = \ \boldsymbol{S}\boldsymbol{v} \tag{2}$$ Because there will be more reactions than metabolites in any realistic large-scale metabolic model, there is no unique solution to this system of equations (there are more unknown variables than equations). The set of possible solutions to the system of equations (2) is called the solution space. We can make the solution space smaller by imposing constraints on the system. For instance, we can constrain the maximum or minimum allowable flux of a certain reaction. However, generally the solution space consists of multiple solutions.
To obtain a solution, the FBA method maximizes or minimizes an objective function Z, which is defined by the user. The objective function can be any linear combination of fluxes. The resulting system of equations with constraints and an objective function is optimized by a linear programming algorithm, and a solution is obtained. Still, the solution space can consist of multiple solutions. In this case, the linear programming algorithm will choose one particular solution. The multiple solutions can be explored by using Flux Variability Analysis (FVA), as we will do for our project. [1]

The fact that the solution is at steady state means that the FBA method is not suitable for investigating the changing behaviour of a system over time. However, it is very useful for obtaining insight in often very complex metabolic networks. For our project, we will apply the FBA method to the extracellular electron transport (EET) module. Our goal is to gain insight in the carbon metabolism providing the electrons for our EET module. The results of this analysis can be found in Flux Balance Analysis.

Percolation Theory

Percolation theory is a mathematical theory that describes randomly placed connections in latices [2]. It finds many applications in real world problems. One example where percolation theory is applied is in the spread of forest fires. Imagine a field that contains a large grid, like a chess board. Each of the squares can contain a tree. The chance that a square has a tree is $ \ p $. Say that there is someone who throws away his burning cigarette and a tree catches fire. Say that this burning tree will also ignite its direct neighbors in all possible directions. At chance p, what is the probability that a path of burning trees exists that connects two opposite sides of the forest. Percolation theory says that when the field is infinitely big, ie. it has infinitely many squares, there will always be a path of burning trees that connects the two opposite sides when $ \ p > \ 0.5 $. Below this particular value of p, no such path will exist. $ \ p = \ 0.5\ $ is called the critical chance, denoted by $ \ p_c $. Above $ \ p_c$, there is an infinitely big cluster of connected trees that spans the entire field. Note: this does not mean that all trees are connected, but it does mean that there is an infinitely large cluster of burning trees. If the chessboard is not infinitely large, then percolation theory states there is not a critical chance $ \ p_c$ above which there will always be a path of burning trees that connects the two opposite sides of the forest and below $ \ p_c$ there never will be a path of burning trees that connects the two opposite sides of the forest. However, percolation theory does state that there will be a sharp transition from 0 to 1 in the chance that there will be a burning path connecting the two opposite sides of the forest [3].

Though this is a relatively easy example, percolation theory is mathematically very complex, especially if the problem is not infinite. Percolation theory is also not restricted to two dimensions, but one can also use it for problems in one, three or even an infinite number of dimensions (or any other number).
In the model we constructed for our conductive curli module, we have something similar. We randomly place cells on a surface of a two dimensional chip and want to know when there is a conductive path between the two electrodes that are on opposite sides of the chip. The cells are modelled by disks with radius $ r $. There is a conductive path between the two electrodes when there is a cluster of cells that are connected to each other and the two electrodes. Therefore, percolation theory can be used for our problem. However, we are not dealing with a large chessboard with squares that can be occupied or not, but we are dealing with continuous space. The theory that deals with these kind of problems is called continuum percolation theory [4].
However, to keep it easy, we discuss here a discrete problem of conductance between two electrodes [3]. Imagine a grid where each square can be made of copper with chance $ \ p $, or be empty with chance $ \ 1- \ p $. On opposite sides of the grid are two electrodes that are connected to a battery. We already discussed that the chance of percolation increases very sharply above a certain $ \ p $. However, we are not only interested in the chance of percolation but also in the conductance of the network. Unfortunately, this does not scale linearly with the chance of percolation, see figure 2.

Figure 2: Top: A resistor network where each square can be occupied (black squares) or empty (white squares). The grid is connected by two electrodes. Bottom: A qualitative figure where the dashed line represents the chance of percolation. The solid line represents the conductance between the two electrodes. [3]

We can intuitively understand the qualitative result from figure 2. The formula for the conductance $\ \sigma $ in $\ \Omega ^{-1} $ of a resistor of length $\ l $ in $\ m $ with cross section area $\ A $ in $\ m^2 $ and electrical resistivity $\ \rho $ in $\ \Omega m $ is $\ \sigma = \frac{A}{l \rho} $. When there is percolation just above that certain $\ p $ where the chance of percolation increases very sharply, chances are that the path between the two opposite electrodes is very long, thus the path has a low $\ \sigma $. When the chance $ \ p $ is increased further, more paths are created (larger $A $ ) and the paths will also be shorter (shorter $\ l $ ), hence the conductance increases drastically (exponentially according to [3]). We expect similar behavior for the conductance of our system in the conductive curli module.

Graph Theory

Graph theory is the study of graphs. A graph is made up of vertices and edges and is a representation of a set of objects where some objects are connected by links [5]. The set of objects are represented by the vertices and the links that connect some pairs of vertices are represented by the edges.
Edges may be directed or undirected, which will result in directed or undirected graphs, respectively. If an edge is undirected, it means that when it connects say vertex 1 to vertex 2, it also connects vertex 2 to vertex 1. However, if an edge is directed, it means that when it connects vertex 1 to vertex 2, it does not mean that vertex 2 is necessarily also connected to vertex 1.
Also, edges may have a certain weight, which results in a weighted graph. If all the edges have a weight of unity, the graph is unweighted.
A graph is called a simple graph when it has no loops, that means edges that start and end at the same vertex, and no more than one edge between any two vertices. Also, it has to be undirected and unweighted [6]. A graph is called connected when there is a path from any vertex to any other vertex in the graph. When this is not possible, the graph is called disconnected [7].

For instance, if we create a graph where vertices are cities and edges are the roads between those cities, we have a undirected graph. Also, as no roads go from a city to that same city, the graph is a simple graph and if we assume that it is possible to reach any city from any other city, the graph is connected. Now, as the edges represent the roads between the cities, we could give them weights, where the weights represent the length of the roads. In that case, we have a weighted graph. However, in that case the graph is no longer a simple graph, as simple graphs have to be unweighted.
As the example above shows, it is quite easy to make a graph representation from a real world phenomenon. This is done quite often in physics and engineering problems, as graph theory is a very elaborated framework in mathematics and a lot of theories and algorithms are available to solve problems for graphs.
So, a good method is to translate your physics or engineering problem to a graph representation and find an algorithm or theory that solves your problem for that graph. Then, implement that algorithm or construct an algorithm based on the theory, and solve your problem for the graph you created. Subsequently, translate your result back to your physics or engineering problem to obtain the solution to your problem. This is exactly what we did for the curli module.
An example of a graph is displayed in figure 3. This graph is undirected, unweighted, simple and connected.

Figure 3: A representation of a graph that is undirected, unweighted, simple and connected. [12]

A graph can be represented by an adjacency matrix A. Vertex i is adjacent to vertex j if there is an edge between them. The adjacency matrix of a graph with n vertices is an n by n matrix, where the coefficient $a_{i,j}$ of the matrix is equal to 1 when vertex i is adjacent to vertex j and equal to 0 when it is not adjacent to vertex j [8]. So, for an undirected graph the adjacency matrix is symmetric. For the graph displayed in figure 3, the adjacency matrix is equal to:

$$ \boldsymbol{A} \ = \begin{bmatrix} 0 && 1 && 0 && 0 && 1 && 0 \\ 1 && 0 && 1 && 0 && 1 && 0 \\ 0 && 1 && 0 && 1 && 0 && 0 \\ 0 && 0 && 1 && 0 && 1 && 1 \\ 1 && 1 && 0 && 1 && 0 && 0 \\ 0 && 0 && 0 && 1 && 0 && 0 \end{bmatrix} $$

From the adjacency matrix and the weights of the edges, the resistance matrix can be calculated for connected weighted graphs. For a connected weighted graph G with n vertices, labelled {1, 2, …, n} and the weight assigned to edge (i,j) equal to w(i,j), the resistance matrix R is an n by n matrix where the coefficient $r_{i,j}$ of the matrix is equal to the resistance distance between vertex i and vertex j [9]. The resistance distance between vertex i and vertex j is equal to the resistance between node i and node j in an electrical network, where the electrical network is constructed in such a way that it can be represented by G [9], [10]. This method also works for weighted graphs, where an edge (i,j) with weight w(i,j) can be represented by a resistor of w(i,j) Ω between node i and node j [9].
The following algorithm can be used to calculate the resistance matrix R with coefficients $r_{i,j}$ for a connected weighted graph G with n vertices, labelled {1, 2, …, n}, the weight assigned to edge (i,j) equal to w(i,j) and A the adjacency matrix of G with coefficients $a_{i,j}$ [9]:

Calculate the Laplacian matrix L of G, where L is an n by n matrix with coefficients $l_{i,j}$. $$ l_{i,j} = \ \begin{cases} -\frac{1}{w(i,j)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ for \ \ \ i \neq j \ \ \ and \ \ \ a_{i,j} = 1\\ \\ \ \ 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ for \ \ \ i \neq j \ \ \ and \ \ \ a_{i,j} = 0\\ \\ -\sum\limits_{\ j=1 \atop j \neq i}^n l_{i,j} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ for \ \ \ i = j \end{cases} $$
Calculate any generalized inverse of L and call it H wit coefficients $h_{i,j}$. Then: $$ r_{i,j}= \ h_{i,j} + \ h_{j,j} - \ h_{i,j} - \ h_{j,i} $$

For the Moore-Penrose inverse of L, $L^{+}$ with coefficients $l^{+}_{i,j}$, we have [11]:

$$ r_{i,j}= \ l^{+}_{i,j} + \ l^{+}_{j,j} - \ 2 \ l^{+}_{i,j} $$

Thus, in this way it is possible to calculate the resistance between any two nodes of an electrical system. First, we represent the electrical system by a connected weighted graph and subsequently, we calculate the resistance matrix with the algorithm described above.

References

[1] J.D. Orth, I. Thiele & B.Ø. Palsson, “What is Flux Balance Analysis?”, Nature Biotechnol. 28, 245-248, 2010.

[2] K. Christensen, "Percolation Theory", MIT 2002, 2002.

[3] D. Stauffer & A. Aharony, "Introduction to Percolation Theory Revised Second Edition", 1994.

[4] R. Meester, "Continuum Percolation", volume 119, Cambridge University Press, 1996.

[5] R.J. Trudeau, "Introduction to Graph Theory", Dover Publications, 1994.

[6] A. Gibbons, "Algorithmic Graph Theory", Cambridge University Press, 1985.

[7] G. Chartrand, "Introductory Graph Theory", Courier Dover Publications, pp. 41-45, 1985.

[8] D.B. West, "Introduction to Graph Theory Second Edition", Pearson, pp. 6-9, 2000.

[9] R.B. Bapat, "Resistance Matrix of a Weighted Graph", MATCDY 50, 73-82, 2004.

[10] D.J. Klein & M. Randić, "Resistance Distance", J. Math. Chem. 12, 81-95, 1993.

[11] A. Ben-Israel & T.N.E. Greville, "Generalized Inverses: Theory and Applications", Wiley, 1977.

[12] commons.wikimedia.org, (2014). Wikimedia Commons. [online]
Available at: http://commons.wikimedia.org/wiki/File:6n-graf.png [Accessed 14 Oct. 2014].