Team:Waterloo/Math Book/CRISPRi

From 2014.igem.org

(Difference between revisions)
(creates page)

Revision as of 23:53, 17 October 2014

Math Book: CRISPR Interference

We decided to create a model of the CRISPR system for two main reasons:

  • Identifying the parts of the network that could be targeted by our lab team to improve repression efficiency
  • To approximate time-series mecA repression data for use in modelling the overall vulnerability of a S. aureus population

Model Formation

After a literature review we were able to construct the CRISPR interference system network. The targeted single guide RNA (sgRNA) associates with nuclease-deficient Cas9 protein (dCas9) to form a complex that binds with the DNA complementary to the sgRNA target . The bound complex prevents transcription elongation by RNA polymerase, repressing YFP mRNA expression . The chemical network is shown below:

CRISPR Network Diagram

Using standard mass-action kinetics, the network simplifies into the following set of differential equations:

Initial CRISPR DEs

We chose the model kinetics to be largely first-order; this decision was supported by the findings of several recent studies . To simplify the model, we assumed that the formation of the dCas9-sgRNA complex ($b$ in Figure xyz) is in made a quasi-steady-state. That is, we assume that the association/dissociation of dCas9 and sgRNA occurs on a faster timescale than the other reactions in the network (i.e. transcription, translation and the binding of the complex to the DNA), allowing us to assume that the complex is always at steady-state, relative to the other time-dependent species concentrations. This same assumption was made in previous modelling efforts, e.g. .

Under this quasi-steady state assumption, the differential expression for the complex is given by:

QSSA Assumption changes complex DE

Our model then simplifies to:

CRISPR DEs Updated with QSSA

This is the same assumption made by previous teams.

Modelling Incomplete Repression

A recent study by Bikard et al. found that maximal repression (on the order of 100 fold) was achieved when the promoter was targeted. However, targeting the promoter is not viable in this project since an essential promoter from elsewhere in the genome has been harnessed to produce the fluorescent promoter. Instead, we model the incomplete repression (ranging from 6-fold to 35-fold) observed when the off-promoter regions, specifically on the non-coding strand, are targeted.

There are two possible approaches for modelling the incomplete repression, each reflecting a different physical mechanism that allows leaky YFP expression. In the first mechanism, RNA polymerase is sometimes able to cleave the bound dCas9-sgRNA complex from the DNA. In the second mechanism, the complex binds inefficiently and is sometimes separated from the DNA, permitting transcription to continue.

We assumed that the incomplete repression is accounted for by the first mechanism. This assumption was based on several studies showing radically different repression rates if the complex targets the promoter, preventing transcription initiation, rather than targeting the DNA further downstream and impeding transcription elongation. The differences in the system behavior depending on whether or not RNA polymerase has the opportunity to bind suggest that the “cleavage” mechanism may more closely resemble the chemical reality.

Consequently, we modeled incomplete repression using a leaky expression term proportional to the expected YFP expression when the complex is saturated. The differential equation model was updated with a repression term dependent on the fold reduction FR and the initial concentration of YFP mRNA, Y0:

Updated YFP DE with Incomplete Repression

This equation was derived using two boundary conditions. Before repression, when the concentration of the complex is zero, YFP mRNA is produced at the rate expected from the sarA promoter, α. After repression has reached its steady state, the YFP mRNA production has been reduced by FR fold, to Y0/FR.

Parameter Finding

We turned to the literature to find parameters for our model, given in the Table below. We first looked for parameter values that had been measured in S. aureus. In cases where those could not be found, we next looked for ways to to estimate the parameters using other available data for S. aureus and finally searched for the parameters in other gram-positive bacteria. Aggregating parameters from many experiments across the literature is by nature a somewhat uncertain endeavor; those parameters about which we are very uncertain are marked with asterisks. An explanation for how we arrived at each parameter is given in the table, but details on the more circuitously estimated parameters are given after the table.

CRISPRi Parameters

Parameter Value Description Source/Rationale
αmy, αr 0.0011 nM • min-1 mRNA production from SarA P1 Promoter Determined based on linear fitting to the time-series fluorescence measurements from YFP/P2-P3-P1 fusion, as reported in and fluorescence per molecule from
αmc 0.0011 nM • min-1 mRNA production from Xylose Promoter Same as SarA rate since the addition of the Xylose-inducible promoter was to simplify labwork and thus for modelling we assume it is fully induced.
βc 0.0057-0.4797 protein • transcript-1 min-1 dCas9 protein synthesis rate from dCas9 mRNA Estimated from peptide elongation rates in Streptomyces coelicolor , the dCas9 BioBrick from and ribosome density from report log-phase mRNA half-lives in {S. aureus}. An approximate average value of 4 minutes leads to this degradation rate.
γc, γb -5.6408e-04 min-01 dCas9/complex degradation rate Based off half-life of SarA protein in S. aureus as reported in
Ka 0.28 nM Dissociation constant for complex and DNA (given by k2/k1) found this dissociation rate for dCas9 and a single-stranded DNA substrate.
n 2.5 Hill Constant for Repression UCSF iGEM 2013
k+, k- 0.01 to 1.0 nM Rate of dissociation of dCas9-sgRNA to form complex Range defined relative to other parameters, using the QSSA assumption that these dynamics are fast
Fold Reduction 6 to 35 Maximum percent repression achievable with CRISPRi system Based on the relative fluorescence measurements observed when the non-coding strand was targeted by dCas9 in

The only model parameters without some basis in the literature are the association rates for dCas9 and sgRNA. However, we have made a quasi-steady state assumption for that reaction, which requires that it reach equilibrium on a much faster time scale than the rest of the system. We thus defined a range for the possible values based on the other model parameters

Details on the more roundabout estimations are given below:

Production of dCas9 from dCas9 mRNA

We were unable to find a peptide chain elongation rate for S. aureus, so instead we used the values reported in BioNumber 107869 which gives a range of 0.59-3.17 amino acids per second per ribosome in Streptomyces coelicolor, another gram-positive bacteria. Freiburg's dCas9 part from last year is composed of 1372 amino acids. This translates to a range of 0.0258 to 0.1386 dCas9 molecules per minute per ribosome.

We were unable to find ribosome densities in S. aureus, but found two different estimates for ribsosome density in Bionumbers: 0.22 ribosomes per 100 codons (i.e. per 3 nt coding sequence) and 3.46 ribosomes per 100 codons . Using our assumption of 3 nt:1 amino acid, we then multiply to get the 0.0057-0.4797 range of dCas9 molecules per minute.

Degradation rate of dCas9

We were unable to find any specific data on dCas9 degradation, so instead we used a protein half-life of sarA measured in S. Aureus by Michelik et al. . We chose sarA rather than a protein more chemically similar to dCas9 because data on sarA was readily available and because dCas9 is transcribed using the sarA promoter, which allows us to at least capture sensitivity of the degradation rate to production.

mRNA production from the sarA promoter

We used the time-series data given by Cheung et al. to estimate the rate of production from the sarA P2-P3-P1 promoter in S. aureus. The figure from their paper is reproduced below. After diluting 1:100, the S. aureus strains were serially monitored for OD_650. We used data from the sarA+ strain, as that's more like a wild-type S. aureus strain.

Exponential fit to SarA Promotor production rates from A. L. Cheung et al.

Using the laboratory-conditions doubling time of 24 minutes given in given in , we found that the bacteria would re-enter stationary phase after 2.5 hours; for time-points after 3 hours, the number of number of sarA genes producing fluorescence could be assumed as constant. For this reason, we excluded time-points prior to 3 hours. We then converted from fluorescence units to number of fluorescent molecules using the quantization measurements provided by Wu & Pollard and, using our assumption of a fixed number of active sarA genes, considered the relative change in number of molecules to be representative of the per-promoter rate.

We were interested, however, in the changes of concentration rather than the changes in the raw number of molecules. As the name suggests, Staphylococcus aureus are spherical in shape. Assuming that all S. aureus are spheres, the volume of the cell can be determined. The diameter of a USA300 S. aureus cell was previously measured as 1.1 μ•m resulting in the overall cell volume to be calculated as 5.575•10-15 L. The number of molecules were thus converted to units of molar concentration in the cell, specifically nanomoles per litre (nM). The exponential fit used to find the rate constant is shown beside the figure from Cheung et al. above.

This resulted in a exponential model a•ebt with a b rate constant of 0.0011 nM/min.

Initial Model Results

Using these estimated parameters, we simulated our differential model of the CRISPR system. It was immediately clear that these parameters, scraped from assorted publications, did not provide an accurate system model once combined. Below are two simulations of the system dynamics using the parameters found in the literature. On the left the system is shown with CRISPR repression active and on the right is a simulation where:

mRNA Y production without CRISPR interference

or rather, the system simulated without any repression of YFP Transcription by CRISPR. Both plots use 6 as the expected final fold reduction.

CRISPR Model with parameters from the literature CRISPR Model with parameters from the literature and no CRISPR interference

The difference between the two plots is negligible. This unrealistic behavior emerges directly from the parameter values. The mRNA degradation dynamics prevent sgRNA from reaching any significant concentration, so there is never enough dCas9-sgRNA complex to influence YFP expression. However, in both plots the YFP mRNA levels plummet from the high initial concentration because the degradation rate is so much higher than the production rate.

These plots do not fit with the observations of CRISPR interference systems reported in the literature . To ensure that our parameters were at fault, rather than the fact that we were examining CRISPR in S. aureus instead of E. coli, we consulted fluorescence results from the laboratory, which showed that unrepressed YFP continued to fluoresce after several hours.

Parameter Adjustment

Fit results acheived with fmincon optimization

Confident that the initial model was inaccurate, we had to update some of the model parameters. We were least sure of our estimates of the mRNA production rates from the literature and manual fiddling showed that CRISPR interference in the model with adjustments to the production rates only. Accordingly, we used MATLAB fmincon to find updated mRNA production rates and fixed all other parameters.

In the MATLAB fmincon parameter search, we generated an error function by comparing the sum of squares error of the model YFP dynamics to time-series data derived Qi et al. , who measured repression with CRISPRi and found a short delay followed by exponential decay with a 35-minute half life. We assumed that the delay and rate of decay would be similar in our system, but fixed the final level of repression according to the two extremes reported by Bikard et al. (6-Fold and 35-Fold) . The simulated time-series data and the model output after the parameter search are compared below.

Satisfied with these fits, we simulated the system again using the fitted values for the mRNA production rates. The model output using the novel parameter sets are shown below for both the 6-fold and 35-fold simulations. The results with and without CRISPR interference are contrasted as before, though now a difference is observed when CRISPR inteacts with YFP

Simulated System Dynamics with 6-Fold Repression

Final dynamics of CRISPRi system with fit to 6-fold repression profile System dynamics with fit to 6-fold repression profile and no CRISPR interference

Simulated System Dynamics with 35-Fold Repression

Final dynamics of CRISPRi system with fit to 35-fold repression profile System dynamics with fit to 35-fold repression profile and no CRISPR interference

The system dynamics differ quite notably depending on the fit: in the 6-fold system, the equilibrium levels of sgRNA are quite low while the equilibrium levels of dCas9 mRNA are high, which is revered in the 35-fold system. More laboratory characterization would be needed to determine which model most closely replicates the actual system dynamics. However, the two sets of parameters provide a basis for sensitivity analysis and some idea of the expected time-series repression.

Sensitivity Analysis

Local Sensitivity Results

Parameter Sensitivity (%)
dCas9 mRNA Production -3.76
dCas9 mRNA Degradation 4.11
YFP mRNA Production 3.56
YFP mRNA Degradation 4.11
sgRNA Production -7.85
sgRNA Degradation 4.11
dCas9 Protein Production -3.77
dCas9 Protein Degradation 1.06
dCas9-sgRNA Complex Degradation 4.70
k- 4.12
k+ -3.76
ka 8.91
n -19.44

We performed sensitivity analysis to discover ways to improve the effect of CRISPRi repression and to more precisely estimate the parameters above. Sensitivity analysis was performed on a global and local level. Since we were interested in improving upon the best possible case of our system, analysis was performed using the 35-fold model. Local sensitivity analysis involves computing the relative change of the steady state with respect to a change in the parameter. Using Matlab and a finite-difference approximation of the derivative, sensitivities were calculated for 5% changes in the parameters. The larger the sensitivity of the parameters shown below (in magnitude), the more important it is for said parameter to be estimated precisely.

The data provided by the local sensitivity analysis provides insight into parameters that need further investigation, such as the Hill Coefficient. However, these parameters are structural and cannot be easily modified through external control. Therefore, it was determined that a more overarching analysis be performed that covers a broad range of the parameter space. Global sensitivity analysis provides a method of estimating parameters that have the greatest effect on the system over a large range. We chose to use an approach that accounted for observed data from Qi et al. , as used during the least squares fit. The approach used is equivalent to used by Chang and Delleur as well as Jia and Yue . The steps of this analysis are explained below:

  1. Formulate upper and lower bounds for the parameters. We chose to let parameters vary one magnitude above and below the least squares fit.<.li>
  2. Generate a set of sample parameter sets that fall within the upper and lower bounds using Latin Hypercube Sampling. Latin
  3. Hypercube sampling is a technique that effectively distributes random samples over a space of values. It ensures good coverage without needing to generate an absurdly large number of samples.
  4. Run a simulation for every parameter set and calculate their errors from the observed data set (in our case, Qi et. al)
  5. Calculate the average error, and set this as a threshold.
  6. Choose a parameter named Q.
  7. Partition parameter sets as acceptable and unacceptable.
  8. Calculate the Empirical Cumulative Distribution Function (ECDF) of parameter Q for acceptable sets, and for unacceptable sets.
  9. Find the maximum of the difference between the two distributions. This is the global sensitivity of the parameter Q.
  10. Repeat steps 5-8 for all parameters in the system.
global sensitivity analysis CRISPR parameters

The supporting argument for this approach derives from the meaning of the differences in ECDFs. Without loss of generality, consider the acceptable set ECDF to be always greater than that of the unacceptable set. If there is a large difference between the two ECDFs, then a change in the parameter - near the location of the difference - will cause a major change in error (past the threshold to unacceptable). This can be restated as a change in behaviour. However, a small difference in the ECDFs implies that the parameter did not have a major effect on the error and so the parameter did not significantly change behaviour compared to other parameters.

Using a Matlab script, the plots above were generated to show the ECDFs of the acceptable and unacceptable sets for each parameters. It was observed that there was a significant difference in the ECDF for mRNA YFP Degradation and production compared to most other parameters. More significantly, other parameters showed a relatively minor effect. The team concluded that in order to supplement CRISPRi repression, the mRNA needs to be the new target.