Team:TU Delft-Leiden/Modeling/Curli/Gene

From 2014.igem.org

Gene Level Modeling

We start with the modeling of the gene expression of proteins involved in the curli formation pathway at the gene level. Proteins that are part of the curli formation pathway are CsgA/B/D/E/F/G [1]. CsgA is the main building block of curli fibrils. When produced, this protein is secreted out of the cell by the CsgEFG complex. In the absence of CsgB, there is no curli formation, since the CsgA proteins remain unpolymerized. CsgB is the starting block of each curli fibril, like a pile under a house, and connects the cell membrane to the first CsgA protein in the curli fibril. Once CsgB is located on the outside of the cell surface, the CsgA can polymerize onto the starting curli fibril.
In the constructs we made in the wet lab, CsgA is continuously being produced and the CsgB gene is placed under the control of a landmine promoter, activated by either TNT or DNT, see the Landmine Detection Module. So, when the cells get induced by TNT or DNT, CsgB protein production will get started and CsgA will already be present in the system, as CsgA is continuously being produced. We first modeled this system by constructing an extensive gene expression model of the curli formation pathway. Subsequently, we simplified this model, so less parameters were needed.

Curli Module

Extensive Gene Level Modeling

A first attempt to model curli growth was to make a deterministic model of the gene expression of proteins involved in the curli formation pathway. We did this by including the following reactions in our model:

The transcription of the csgA and csgB genes with production rates $k_{m_A}$ and $k_{m_B}$ and degradation rates $d_{m_A}$ and $d_{m_B}$. (equations 1.1-2)
The csgA and csgB mRNA's are translated with rates $k_{t_A}$ and $k_{t_B}$. This creates an unfolded $CsgA_{cell}$ and $CsgB_{cell}$ proteins with degradation rates $d_{p_A}$ and $d_{p_B}$. (equation 1.3-4)
The $CsgB_{cell}$ can attach the cell membrane with the help of CsgEFG at rate $k_{B_{mem}}$ and form $CsgB_{mem}$. (equation 1.5)
The $CsgA_{cell}$ can be transported outside the cell to become $CsgA_{out}$ when it interacts with a CsgEFG complex on the membrane with rate $k_{A_{mem}}$ and has a degradation rate of $d_{A_{out}}$. (equation 1.6)
$CsgA_{out}$ can form curli (with degradation rate $d_{curli}$) proportional to $CsgB_{mem}$ with production rate $k_{curli}$. (equation 1.7)

Translating all these reactions into differential equations results in:

$$ \frac{d}{dt} [mRNA_{csgA}] = \ k_{m_A} [csgA_{gene}] - \ d_{m_A} [mRNA_{csgA}] \tag{1.1} $$ $$ \frac{d}{dt} [mRNA_{csgB}] = \ k_{m_B} [csgB_{gene}] - \ d_{m_B} [mRNA_{csgB}] \tag{1.2} $$ $$ \frac{d}{dt} [CsgA_{cell}] = \ k_{t_A} [mRNA_{csgA}] - \ k_{A_{mem}} [CsgA_{cell}] [CsgEFG] - \ d_{p_A} [CsgA_{cell}] \tag{1.3} $$ $$ \frac{d}{dt} [CsgB_{cell}] = \ k_{t_B} [mRNA_{csgB}] - \ k_{B_{mem}} [CsgB_{cell}] [CsgEFG] - \ d_{p_B} [CsgB_{cell}] \tag{1.4} $$ $$ \frac{d}{dt} [CsgB_{mem}] = \ k_{B_{mem}} [CsgB_{cell}] [CsgEFG] \tag{1.5} $$ $$ \frac{d}{dt} [CsgA_{out}] = \ k_{A_{mem}} [CsgB_{cell}] [CsgEFG] - \ k_{curli} [CsgB_{mem}][CsgA_{out}] - \ d_{A_{out}} [CsgA_{out}] \tag{1.6} $$ $$ \frac{d}{dt} [curli] = \ k_{curli} [CsgB_{mem}][CsgA_{out}] - \ d_{curli} [curli] \tag{1.7}$$

This leaves us with an awful lot of constants. We first discuss the simplest: the degeneration rates. The degeneration rates of mRNA is on the order of 2 to 5 minutes [2]. Furthermore, the average protein half-life is about 23 hours (it ranges from 12 to 42 hours) [3]. However, for the curli proteins this value cannot be true, since that would mean that most of the curli proteins are already degenerated during biofilm formation (which may take days). Therefore, we reasoned that the degeneration rates of the proteins is negligible.

Now, there are also the translation and transcription rates of CsgA and CsgB. The truth is that these numbers are highly dependent on the strength of the promoter and the RBS. Furthermore, the copy number of the plasmid is also an important factor in the amount of protein being produced. Data about the production rates is either absent or presented in arbitrary units. We did find that the translation rate by a ribosome is in the order of 20 aminoacids/s [4].
Furthermore, there is an equally long list of kinetic parameters that are specific for curli, like the secretion rates of CsgA and growth rate of curli.

We tried to include terms that would limit the amount of CsgB that could be in the membrane and enabled the curli to go back to unfolded CsgA using educated guesses for these respective parameters. This extensive model raises more questions, in the form of unknown parameters, than it can answer. What we do see is that the amount of curli increases indefinitely in a straight line after a short time. We have therefore decided that we do not further pursue the extensive gene model, but reduce our system to the bare essentials, as this reduces the number of unknown parameters.

Simplified Gene Level Modeling

Though the model described above, providing that all rates are known, has a more accurate (though still simplified) representation of the curli assembly system, we have chosen to decrease the complexity further to the bare essentials, as most of the production rates cannot be found in literature. Measuring the accurate rates in the wet lab is, within the scope of this project, infeasible and therefore, we constructed a model that only includes the rate limiting step of the system as this will mostly determine the dynamics of the system.
First of all, we investigated if the diffusion of the CsgA and CsgB proteins to their final destination is the rate limiting step in curli formation. From the literature and the wet lab, we know that the system response to the induction by TNT or DNT is in the order of hours [5]. If diffusion is the rate limiting step, it would mean that CsgA and CsgB proteins would pile up inside and outside the cell, because it takes a long time for them to travel to their final destination, the end of a growing curli fibril and the outer membrane, respectively. A quick calculation shows that after one second, the displacement of a spherical particle with radius $r = \ 10 \ nm$ is 6.6 μm due to Brownian motion in liquid water at room temperature using equation 2; many times the bacterial radius! Hence, we conclude that diffusion is not rate limiting.

$$ \bar{x}^2 = \ \frac{k_b T t}{3 \pi \eta r} \tag{2}$$

What we do expect to be the rate limiting step for curli formation on the long term is the large amount of CsgA and CsgB proteins that have to be produced. Hence, we expect the production rate of one of these proteins to be the rate limiting step. Instead of including the intermediate steps, we have implemented the production of the CsgA and CsgB proteins with one reaction and associated production rate each. These rates have to be measured in the lab. We will use the following system of equations:

$$ \emptyset \xrightarrow{p_{A}} \ CsgA_{free} \tag{3} $$ $$ \emptyset \xrightarrow{p_{B}} \ CsgB \tag{4} $$ $$ CsgA_{free} + \ CsgB \xrightarrow{k} \ CsgA_{curli} + \ CsgB \tag{5} $$

Equations 3 and 4 represent the production of CsgA and CsgB proteins, respectively. Equation 5 represents the growing of a curli fibril, where a curli fibril reacts with a free CsgA protein to become part of the curli. In reality, this reaction only happens at the end of the curli fibrils. In our model, we assume a homogeneous concentration of all the substances and we cannot discriminate between curli subunits. It is theoretically possible to model the system as an infinite amount of possible reactions that can take place to increase a curli fibril with length i to length i+1 at rate k [8]. However, we are merely interested in the growth rates of the curli, since the distribution of the curli length will follow from the model at the cell level. Therefore, we decided to model the growing of curli at the gene level as reaction 5. We assume that each CsgB protein is the start of a curli fibrils, thus the concentration of CsgB equals the concentration of curli. We can do this, because we showed that the diffusion of CsgA and CsgB proteins to their final destination is not the rate limiting step. Therefore, nearly all the CsgB proteins will be the beginning of a curli fibril in reality and our assumption is valid.
So, in reaction 5 we let a free CsgA protein react with a curli fibril to a CsgA protein that is part of that curli and the curli itself again, as it is immediatily again availible for the next reaction with a free CsgA protein to grow even more. Therefore, curli growth is dependent on the rate k and the concentration of $CsgA_{free}$ and CsgB.

Writing reactions 2-4 into differential equations results in:

$$ \frac{d}{dt} [CsgA_{free}] = \ p_{A} - \ k [CsgA_{free}][CsgB] \tag{6.1} $$ $$ \frac{d}{dt} [CsgB] = \ p_{B} \tag{6.2} $$ $$ \frac{d}{dt} [CsgA_{curli}] = \ k [CsgA_{free}][CsgB] \tag{6.3} $$

Fortunately, this system can be solved analytically. To do this, we need the initial conditions. Say the CsgB promoter is activated at $t= \ 0$. At this time there are no curli present, so $[CsgB]|_{t=0} = \ [CsgA_{curli}]|_{t=0}= \ 0$. However, the CsgA promoter is continuously active, so we expect to have an initial concentration $A_0$ of free CsgA proteins at time $t= \ 0$.

The solution to equation 6.2 is trivial:

$$ [CsgB] = \ p_B t \tag{7}$$

Substituting this into equation 6.1 results in:

$$ \frac{d}{dt} [CsgA_{free}] = \ p_{A} - \ K p_B [CsgA]t \tag{8} $$

It can easily be proven that a first order differential equation of the form

$$ y(t)' + \ f(t)y(t) = \ g(t) $$

has a solution of the form

$$ y(t) = \ e^{-F(t)} \int{g(t) e^{F(t)} dt} + \ y_0 e^{-F(t)} $$

where $F(t)= \int{f(t) dt}$. In our case, $f(t) = \ k p_B t$ and $g(t) = \ p_A$. This yields equation 9.

$$ [CsgA_{free}] = \ p_A e^{\frac{-k \ p_B t^2}{2}} \int{e^{\frac{k \ p_B t^2}{2}} dt} + \ C_{1} e^{\frac{-k \ p_B t^2}{2}} = \ p_A e^{\frac{-k \ p_B t^2}{2}} \int_{0}^{t}{e^{\frac{k \ p_B \tau^2}{2}} d\tau} + \ C_{2} e^{\frac{-k \ p_B t^2}{2}} \tag{9} $$

One with a keen eye may recognize the Dawson function (equation 10):

$$ D_+ (x) = \ e^{-x^2 } \int_{0}^x{e^{y^2} dy} \tag{10} $$

As in our case, $x^2 = \ k p_B t^2 $ and $y^2 = k p_B \tau^2 $ and equation 11 obtained.

$$ [CsgA_{free}] = \ \frac{p_A D_+ (t\sqrt{\frac{k \ p_B}{2}})}{\sqrt{\frac{k \ p_B}{2}}} + \ C_{2} e^{\frac{-k \ p_B t^2}{2}} \tag{11}$$

Using the boundary condition $[CsgA_{free}]|_{t=0}= \ A_0$, the expression for the concentration of free CsgA proteins becomes:

$$ [CsgA_{free}] = \ \frac{p_A D_+ (t\sqrt{\frac{k \ p_B}{2}})}{\sqrt{\frac{k \ p_B}{2}}} + \ A_0 e^{\frac{-k \ p_B t^2}{2}} \tag{12}$$

Now, we can fill in equations 12 and 7 into equation 6.3, which gives us equation 13.

$$ \frac{d}{dt} [CsgA_{curli}] = \ k p_B t \left( \frac{p_A D_+ (t\sqrt{\frac{k \ p_B}{2}})}{\sqrt{\frac{k \ p_B}{2}}} + \ A_0 e^{\frac{-k \ p_B t^2}{2}} \right) \tag{13} $$

Precise numbers for $p_{A}$, $p_{B}$, $k$ and $A_0$ have to be measured in the wet lab. For now we will use an estimation. We've seen in the landmine model that for the csgB promoter 27 proteins per cell per second are produced. However, this value is too high the our model. This would mean that either thousands of csgA have to be created per second, or the CsgB production has to halt very quickly to not create curli fibrils that are extremely short. We estimated the length of a single curli subunit (folded CsgA) at 4 nm [6][7].

Table 1: Parameters used to obtain quantitative results from the analytical solution for the curli production.
Parameters	Value	Unit
$\boldsymbol{p_{A}}$	$1.0 \cdot 10^{-10}$	$\frac{1}{Ms}$
$\boldsymbol{p_{B}}$	$1.3 \cdot 10^{-13}$	$\frac{M}{s}$
$\boldsymbol{k}$	$1.4 \cdot 10^{6}$	$\frac{1}{Ms}$
$\boldsymbol{A_0}$	$6.0 \cdot 10^{-6}$	$M$

Plotting equation 13 with the parameter values in table 1 yields the graph shown in figure 1.

Figure 1: The production rates of curli (blue) and csgB (green) in units per second as function of time.

Figure 1 shows a steady production of CsgB. $CsgA_{curli}$ concentration at $t= \ 0$ is zero as expected, since there is no CsgB at that point. In the next few hours, $CsgA_{curli}$ concentration peaks. We think that this is due to the high concentration of $CsgA_{free}$ that is present at $t= \ 0$. In figure 2, curli growth as function of time is plotted for different initial concentrations of $CsgA_{free}$.

Figure 2: The curli subunit growth in units per second for various initial concentrations $ A_0 $ of CsgA as function of time. Initial concentrations that equal 0, 5, 10 or 15 hours of CsgA production are shown.

We conclude the following from figure 2:

Firstly, as expected, curli growth stabilizes to a rate equal to $p_{A}$ after approximately 2 hours, independent of the initial concentration of $CsgA_{free}$, $A_0$. The width of this peak is determined by the product $ k p_B$.
Secondly, increasing the initial concentration of $CsgA_{free}$, $A_0$, increases the height of the peak. Even with zero initial $CsgA_{free}$ concentration, a small peak can be found at one hour. This is a consequence of $CsgA_{free}$ build-up when the CsgB concentration is still very small.
Thirdly, during the first two hours, few CsgB proteins are present in the system. We therefore expect that the length of the curli fibrils that started in the first few hours are much longer than the fibrils that started at later times.

Although our model gives good insights into what can be expected for the curli growth (figure 2), we acknowledge that our parameters were all educated guesses. These values should be compared to values obtained in the wet lab. If future experiments could give us more information, we could easily incorporate these in our existing model and get an even better idea of curli growth.

References

[1] M.L. Evans & M.R. Chapman, "Curli Biogensis: Order out of disorder", Biochim. Biophys. 1843, 1551-1558, 2014.

[2] M. Pedersen, S. Pedersen et al. , "The functional half-life of an mRNA depends on the ribosome spacing in an early coding region", J. Mol. Biol. 407, 1, 2011.

[3] T. Maier, L. Serrano et al. , "Quantification of mRNA and protein and integration with protein turnover in a bacterium", Mol. Syst. Biol. 7, 511, 2011.

[4] H. Bremer, P.P. Dennis, "Modulation of chemical composition and other parameters of the cell by growth rate Second Edition", ASM Press, 1559, 1996.

[5] S. Yagur-Kroll, S. Belkin et al., “Escherichia Coli bioreporters for the detection of 2,4-dinitrotoluene and 2,4,6-trinitrotoluene”, Appl. Microbiol. Biotechnol. 98, 885-895, 2014

[6] calctoolo.org, (2014). Calctool. [online]
Available at: www.calctool.org/CALC/prof/bio/protein_size [Accessed 16 Oct. 2014].

[7] Q. Shu, C. Frieden et al. , "The E. coli CsgB nucleator of curli assembles to β-sheet oligomers that alter the CsgA fibrillization mechanism", Proc. Natl. Acad. Sci. 109, 6502-6507, 2012.

[8] S. Prigent, M. Doumic et al. , "An Efficient Kinetic Model for Assemblies of Amyloid Fibrils and Its Application to Polyglutamine Aggregation", PLOS ONE 7, 11, 2014.

Parameters	Value	Unit
\(\boldsymbol{p_{A}}\)	\(1.0 \cdot 10^{-10}\)	\(\frac{1}{Ms}\)
\(\boldsymbol{p_{B}}\)	\(1.3 \cdot 10^{-13}\)	\(\frac{M}{s}\)
\(\boldsymbol{k}\)	\(1.4 \cdot 10^{6}\)	\(\frac{1}{Ms}\)
\(\boldsymbol{A_0}\)	\(6.0 \cdot 10^{-6}\)	\(M\)