Team:UESTC-Software/Modeling.html

From 2014.igem.org

(Difference between revisions)
Line 35: Line 35:
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">d 0</td> <td class="pc2">Average distance for all the mismatch nucleotides to the PAM of any off-target site</td> <td class="pc3">0-19</td> <td class="pc4">nt</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">d0</span></td> <td class="pc2">Average distance for all the mismatch nucleotides to the PAM of any off-target site</td> <td class="pc3">0-19</td> <td class="pc4">nt</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">i</td> <td class="pc2">Continuous variables for the number of mismatch nucleotide</td> <td class="pc3">1-Nmm</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">i</span></td> <td class="pc2">Continuous variables for the number of mismatch nucleotide</td> <td class="pc3">1-<span class="serif">Nmm</span></td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">j</td> <td class="pc2">Continuous variables for the total number of off-target sites exclude the perfect-hit off-target sites</td> <td class="pc3">1-(Nfg-Nph)</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">j</span></td> <td class="pc2">Continuous variables for the total number of off-target sites exclude the perfect-hit off-target sites</td> <td class="pc3">1-(<span class="serif">Nfg</span>-<span class="serif">Nph</span>)</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">M</td> <td class="pc2">Weight matrix</td> <td class="pc3">[0,0,0.014,0,0,0.395,0.317,0,0.389,0.079,0.445,0.508,0.613,0.851,0.732,0.828,0.615,0.804,0.685,0.583]</td> <td class="pc4"></td> <td class="pc5">Reference: DNA targeting specificity of RNA-guided Cas9 nucleases, Hsu et al, 2013</td>
+
<td class="pc1"><span class="serif">M</span></td> <td class="pc2">Weight matrix</td> <td class="pc3">[0,0,0.014,0,0,0.395,0.317,0,0.389,0.079,0.445,0.508,0.613,0.851,0.732,0.828,0.615,0.804,0.685,0.583]</td> <td class="pc4"></td> <td class="pc5">Reference: DNA targeting specificity of RNA-guided Cas9 nucleases, Hsu et al, 2013</td>
</tr>
</tr>
<tr>
<tr>
Line 50: Line 50:
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Nfg</td> <td class="pc2">The total number of off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">Nfg</span></td> <td class="pc2">The total number of off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Nmm</td> <td class="pc2">The number of mismatch nucleotide for the not perfect-hit off-target sites</td> <td class="pc3">1-4</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">Nmm</span></td> <td class="pc2">The number of mismatch nucleotide for the not perfect-hit off-target sites</td> <td class="pc3">1-4</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Nph</td> <td class="pc2">Perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, we allow the maximum value of Nph is 4, when Nph≥4, Sguide=0</td>
+
<td class="pc1"><span class="serif">Nph</span></td> <td class="pc2">Perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, we allow the maximum value of <span class="serif">Nph</span> is 4, when <span class="serif">Nph</span>≥4, <span class="serif">Sguide</span>=0</td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">r 1</td> <td class="pc2">The proportion of specificity score in the total score</td> <td class="pc3">0-1</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, it’s default value is 0.65</td>
+
<td class="pc1"><span class="serif">r1</span></td> <td class="pc2">The proportion of specificity score in the total score</td> <td class="pc3">0-1</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, it’s default value is 0.65</td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">r 2</td> <td class="pc2">The proportion of efficacy score in the total score</td> <td class="pc3">0-1</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, it’s default value is 0.35</td>
+
<td class="pc1"><span class="serif"><span class="serif">r2</span></span></td> <td class="pc2">The proportion of efficacy score in the total score</td> <td class="pc3">0-1</td> <td class="pc4">1</td> <td class="pc5">In our scoring algorithm, it’s default value is 0.35</td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">S1</td> <td class="pc2">The score of the first step</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">S1</span></td> <td class="pc2">The score of the first step</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">S20</td> <td class="pc2">The subtracted score for the 20th nucleotide is not a guanine</td> <td class="pc3">=35</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">S20</span></td> <td class="pc2">The subtracted score for the 20th nucleotide is not a guanine</td> <td class="pc3">=35</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Seff</td> <td class="pc2">The efficacy score</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Represent the level of efficacy for the sgRNA</td>
+
<td class="pc1"><span class="serif">Seff</span></td> <td class="pc2">The efficacy score</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Represent the level of efficacy for the sgRNA</td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Sgc</td> <td class="pc2">The subtracted score for different  GC ratio</td> <td class="pc3">0,35,65</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">Sgc</span></td> <td class="pc2">The subtracted score for different  GC ratio</td> <td class="pc3">0,35,65</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Sguide</td> <td class="pc2">The total score of the sgRNA</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Composed of Seff and Sspe, marking the overall properties(specificity and efficacy) of the sgRNA</td>
+
<td class="pc1"><span class="serif">Sguide</span></td> <td class="pc2">The total score of the sgRNA</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Composed of Seff and <span class="serif">Sspe</span>, marking the overall properties(specificity and efficacy) of the sgRNA</td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Smm</td> <td class="pc2">The subtracted score of the mismatch nucleotide for the not perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">Smm</span></td> <td class="pc2">The subtracted score of the mismatch nucleotide for the not perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Sph</td> <td class="pc2">The subtracted score of the perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
+
<td class="pc1"><span class="serif">Sph</span></td> <td class="pc2">The subtracted score of the perfect-hit off-target sites</td> <td class="pc3">≥0</td> <td class="pc4">1</td> <td class="pc5"></td>
</tr>
</tr>
<tr>
<tr>
-
<td class="pc1">Sspe</td> <td class="pc2">The specificity score</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Represent the level of specificity for the sgRNA</td>
+
<td class="pc1"><span class="serif">Sspe</span></td> <td class="pc2">The specificity score</td> <td class="pc3">0-100</td> <td class="pc4">1</td> <td class="pc5">Represent the level of specificity for the sgRNA</td>
</tr>
</tr>
</tbody>
</tbody>
Line 94: Line 94:
<div class="question"  id="p3">3.Scoring algorithm</div>
<div class="question"  id="p3">3.Scoring algorithm</div>
<p><ul>Judged conditions:
<p><ul>Judged conditions:
-
<li><pre style="padding: 0;background: none;border: none;color: #777;line-height: inherit;">①Bad GC ratio (< 40% or > 80%) : Sgc = 65;
+
<li><pre style="padding: 0;background: none;border: none;color: #777;line-height: inherit;">①Bad GC ratio (< 40% or > 80%) : <span class="serif">Sgc</span> = 65;
-
Not so good GC ratio (40% - 50% or 70% - 80%): Sgc = 35;
+
Not so good GC ratio (40% - 50% or 70% - 80%): <span class="serif">Sgc</span> = 35;
-
Good GC ratio (51%-69%): Sgc = 0.[1]
+
Good GC ratio (51%-69%): <span class="serif">Sgc</span> = 0.[1]
</pre></li>
</pre></li>
-
<li>②The 20th nucleotide is not G: S20 = 35;[1]</li>
+
<li>②The 20th nucleotide is not G: <span class="serif">S20</span> = 35;[1]</li>
-
<li>③If the sgRNA designed perfectly hit another sites, the penalty Sph = 25;if perfectly hit more than or equal to 4 loci, the total score Sguide is 0.
+
<li>③If the sgRNA designed perfectly hit another sites, the penalty <span class="serif">Sph</span> = 25;if perfectly hit more than or equal to 4 loci, the total score <span class="serif">Sguide</span> is 0.
</li></ul></p>
</li></ul></p>
<p><ul style="width:100%;">Steps:
<p><ul style="width:100%;">Steps:
-
<li>(1) Firstly, find out the number of off – target sequence Nfg, if Nfg = 0, output Sspe= r1*100;
+
<li>(1) Firstly, find out the number of off – target sequence <span class="serif">Nfg</span>, if <span class="serif">Nfg</span> = 0, output <span class="serif">Sspe</span>= <span class="serif">r1</span>*100;
-
Otherwise, detect the third condition. If there is a sgRNA designed perfectly hit another site, regard the number of it as Nph, and then the score of the first step: S1 = Sph * Nph (Nph is 4 or less). When S1 is equal to or less than 75, perform step (2), otherwise the output Sguide = 0;<br/>
+
Otherwise, detect the third condition. If there is a sgRNA designed perfectly hit another site, regard the number of it as <span class="serif">Nph</span>, and then the score of the first step: <span class="serif">S1</span> = <span class="serif">Sph</span> * <span class="serif">Nph</span> (<span class="serif">Nph</span> is 4 or less). When <span class="serif">S1</span> is equal to or less than 75, perform step (2), otherwise the output <span class="serif">Sguide</span> = 0;<br/>
-
If there is no sgRAN designed perfectly hit other sites, the score of the first step S1 = 0. This illustrates that there is no nucleotide which are matched between sgRNA and the place missed. Then perform step(2)。
+
If there is no sgRAN designed perfectly hit other sites, the score of the first step <span class="serif">S1</span> = 0. This illustrates that there is no nucleotide which are matched between sgRNA and the place missed. Then perform step(2)。
</li>
</li>
-
<li>(2) When performing step (2), remove the Nph which is perfectly hit first.
+
<li>(2) When performing step (2), remove the <span class="serif">Nph</span> which is perfectly hit first.
-
For Nfg-Nph which does not perfectly hit, please combine the weight ratio which obtained in the literature:
+
For <span class="serif">Nfg</span>-<span class="serif">Nph</span> which does not perfectly hit, please combine the weight ratio which obtained in the literature:
-
M=[ 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583];[4]
+
<span class="serif">M</span>=[ 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583];[4]
Using the formula:<img src="https://static.igem.org/mediawiki/2014/c/c5/2014-UESTC-Software-M1.png" style="position: relative;top: 4px;"><br/>
Using the formula:<img src="https://static.igem.org/mediawiki/2014/c/c5/2014-UESTC-Software-M1.png" style="position: relative;top: 4px;"><br/>
</li>
</li>
</ul></p>
</ul></p>
-
<p>Assuming that specific score: efficacy score = r1: r2 (the default is r1:r2 = 0.65:0.35), and then use formula specificity scores Sspe =<img src="https://static.igem.org/mediawiki/2014/3/37/2014-UESTC-Software-M2.png">(when 100-<img src="https://static.igem.org/mediawiki/2014/7/7f/2014-UESTC-Software-M3.png">, Sspe=0), efficacy score Seff = r2 * (100 - (Sgc + S20)), the total score: <img src="https://static.igem.org/mediawiki/2014/c/c0/2014-UESTC-Software-M4.png">;  
+
<p>Assuming that specific score: efficacy score = <span class="serif">r1</span>: <span class="serif">r2</span> (the default is <span class="serif">r1</span>:<span class="serif">r2</span> = 0.65:0.35), and then use formula specificity scores <span class="serif">Sspe</span> =<img src="https://static.igem.org/mediawiki/2014/3/37/2014-UESTC-Software-M2.png">(when 100-<img src="https://static.igem.org/mediawiki/2014/7/7f/2014-UESTC-Software-M3.png">, <span class="serif">Sspe</span>=0), efficacy score <span class="serif">Seff</span> = r2 * (100 - (<span class="serif">Sgc</span> + <span class="serif">S20</span>)), the total score: <img src="https://static.igem.org/mediawiki/2014/c/c0/2014-UESTC-Software-M4.png">;  
-
Finally according to Sguide score, arranging the sgRNA from high to low, outputting sgRNA, total score Sguide, specificity scores Sspe, efficacy score Seff, the chromosome and its site connected to sgRNA, the GC ratio.
+
Finally according to <span class="serif">Sguide</span> score, arranging the sgRNA from high to low, outputting sgRNA, total score <span class="serif">Sguide</span>, specificity scores <span class="serif">Sspe</span>, efficacy score <span class="serif">Seff</span>, the chromosome and its site connected to sgRNA, the GC ratio.
</p>
</p>
</div>
</div>

Revision as of 18:41, 17 October 2014

UESTC-Software

Models and Algorithms

1.Overview

Modeling is a powerful tool in synthetic biology and engineering. In our project, we aim to design a bioinformatics tool “CRISPR-X”, which is a software developed for design of CRISPR sgRNA with minimized off-target effects and high cutting rate.

The CRISPR-associated (Cas)9 can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system[1,2]. Related reports suggest gRNAs are most effective with a GC-content between 40 and 80%. [1] In addition, a guanine at position 20 in the target site, which appears to improve cutting rate. [1] Therefore, we use efficacy score to characterize the activity of the sgRNA.

For sgRNA sequences can be 17-20 nt in length to achieve similar levels of on-target gene editing,and up to 10,000 fold improvement in target specificity when truncated (17 or 18 base pair) sgRNA is used. [3] We design the length of sgRNA sequences vary from 17nt to 20nt.

First of all, we find the protospacer-adjacent motif (PAM) based on user-specified gene region. Then, we find sgRNA corresponding to the PAM. Next, we find that whether there is a potential off-target binding site for the sgRNA over the entire gene region, and evaluate the specificity and efficacy of the sgRNA. Finally, we provide a secondary structure and the restriction enzyme cutting sites for the sgRNA.

2.Parameters
Parameters Description Range Unit Remark
d0 Average distance for all the mismatch nucleotides to the PAM of any off-target site 0-19 nt
i Continuous variables for the number of mismatch nucleotide 1-Nmm 1
j Continuous variables for the total number of off-target sites exclude the perfect-hit off-target sites 1-(Nfg-Nph) 1
M Weight matrix [0,0,0.014,0,0,0.395,0.317,0,0.389,0.079,0.445,0.508,0.613,0.851,0.732,0.828,0.615,0.804,0.685,0.583] Reference: DNA targeting specificity of RNA-guided Cas9 nucleases, Hsu et al, 2013
n Mismatch position 1-20
Nfg The total number of off-target sites ≥0 1
Nmm The number of mismatch nucleotide for the not perfect-hit off-target sites 1-4 1
Nph Perfect-hit off-target sites ≥0 1 In our scoring algorithm, we allow the maximum value of Nph is 4, when Nph≥4, Sguide=0
r1 The proportion of specificity score in the total score 0-1 1 In our scoring algorithm, it’s default value is 0.65
r2 The proportion of efficacy score in the total score 0-1 1 In our scoring algorithm, it’s default value is 0.35
S1 The score of the first step ≥0 1
S20 The subtracted score for the 20th nucleotide is not a guanine =35 1
Seff The efficacy score 0-100 1 Represent the level of efficacy for the sgRNA
Sgc The subtracted score for different GC ratio 0,35,65 1
Sguide The total score of the sgRNA 0-100 1 Composed of Seff and Sspe, marking the overall properties(specificity and efficacy) of the sgRNA
Smm The subtracted score of the mismatch nucleotide for the not perfect-hit off-target sites ≥0 1
Sph The subtracted score of the perfect-hit off-target sites ≥0 1
Sspe The specificity score 0-100 1 Represent the level of specificity for the sgRNA
3.Scoring algorithm

    Judged conditions:
  • ①Bad GC ratio (< 40% or > 80%) : Sgc = 65;
    Not so good GC ratio (40% - 50% or 70% - 80%): Sgc = 35;
    Good GC ratio (51%-69%): Sgc = 0.[1]
    
  • ②The 20th nucleotide is not G: S20 = 35;[1]
  • ③If the sgRNA designed perfectly hit another sites, the penalty Sph = 25;if perfectly hit more than or equal to 4 loci, the total score Sguide is 0.

    Steps:
  • (1) Firstly, find out the number of off – target sequence Nfg, if Nfg = 0, output Sspe= r1*100; Otherwise, detect the third condition. If there is a sgRNA designed perfectly hit another site, regard the number of it as Nph, and then the score of the first step: S1 = Sph * Nph (Nph is 4 or less). When S1 is equal to or less than 75, perform step (2), otherwise the output Sguide = 0;
    If there is no sgRAN designed perfectly hit other sites, the score of the first step S1 = 0. This illustrates that there is no nucleotide which are matched between sgRNA and the place missed. Then perform step(2)。
  • (2) When performing step (2), remove the Nph which is perfectly hit first. For Nfg-Nph which does not perfectly hit, please combine the weight ratio which obtained in the literature: M=[ 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583];[4] Using the formula:

Assuming that specific score: efficacy score = r1: r2 (the default is r1:r2 = 0.65:0.35), and then use formula specificity scores Sspe =(when 100-, Sspe=0), efficacy score Seff = r2 * (100 - (Sgc + S20)), the total score: ; Finally according to Sguide score, arranging the sgRNA from high to low, outputting sgRNA, total score Sguide, specificity scores Sspe, efficacy score Seff, the chromosome and its site connected to sgRNA, the GC ratio.

4.Algorithm illustration

In literature [4], The algorithm used to score single off-targets is:

This algorithm is adopted by CRISPR-P, the inadequacies of this algorithm are: (a) Despite the presence of off-target sites, but sometimes it's subtracted score will still be 0 (which seems unreasonable under certain circumstances, and it will confuse the scoring of those sgRNA that don’t exist off-target sites). (b) Using W function, which cannot be expressed by elementary functions, it will take some additional time in calculation.

However, our algorithm can avoid these two shortcomings. Our algorithm is:

First, we use the summation of exponential replaced W function; Secondly, when there exist off-target sites, our running results will be with the subtracted score, and we use the rounding to further ensure this situation. The following table can show the score contrast.

Off-target sequence Mismatches CRISPR-P Score OUR Software Score
GTTTCTCCGTAATCGCGTCA 4 0.8 0.989
GTTCTTCCACAATTCCGTTA 4 0 0.391
TTTCTTCCAGAATCGTGACT 4 0 0.426
GAAAAATTCCTCTTATTTCA 2 3.9 2.177
GAACAACTCCTCTTATTACA 2 2.4 1.187
GAAGAACTACGCTTATGACA 4 0 0.402

    Reference:
  • [1] Wang, T., Wei, J. J., Sabatini, D. M., & Lander, E. S. (2014). Genetic screens in human cells using the CRISPR-Cas9 system. Science, 343(6166), 80-84.
  • [2] Gagnon, J. A., Valen, E., Thyme, S. B., Huang, P., Ahkmetova, L., Pauli, A., ... & Schier, A. F. (2014). Efficient mutagenesis by Cas9 protein-mediated oligonucleotide insertion and large-scale assessment of single-guide RNAs.PloS one, 9(5), e98186.
  • [3] Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M., & Joung, J. K. (2014). Improving CRISPR-Cas nuclease specificity using truncated guide RNAs.Nature biotechnology, 32(3), 279-284.
  • [4] Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A., Konermann, S., Agarwala, V., ... & Zhang, F. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology, 31(9), 827-832.

1.Overview
2.Parameters
3.Scoring algorithm
4.Algorithm illustration