Team:HFUT CHINA/Examples.html
From 2014.igem.org
(5 intermediate revisions not shown) | |||
Line 4: | Line 4: | ||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"> | <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"> | ||
- | + | ||
<link href="https://2014.igem.org/Team:HFUT_CHINA/style.css?action=raw&ctype=text/css" rel="stylesheet" type="text/css" /> | <link href="https://2014.igem.org/Team:HFUT_CHINA/style.css?action=raw&ctype=text/css" rel="stylesheet" type="text/css" /> | ||
<link href="https://2014.igem.org/Team:HFUT_CHINA/webwidget_menu_dropdown.css?action=raw&ctype=text/css" rel="stylesheet" type="text/css"> | <link href="https://2014.igem.org/Team:HFUT_CHINA/webwidget_menu_dropdown.css?action=raw&ctype=text/css" rel="stylesheet" type="text/css"> | ||
Line 121: | Line 121: | ||
The second one is to count the probabilities of the first biobrick of each device chain. This work could be done by scanning the first biobrick of each chain. Thus, the complexity of this algorithm is <strong>O(n)</strong>.</p> | The second one is to count the probabilities of the first biobrick of each device chain. This work could be done by scanning the first biobrick of each chain. Thus, the complexity of this algorithm is <strong>O(n)</strong>.</p> | ||
<p align="justify" class="STYLE12"><br> | <p align="justify" class="STYLE12"><br> | ||
- | The third one is to calculate the hamming distances of each paired biobrickes. Since there are n device chains and each device contains at most m biobrickes. Therefore, there are at most nm biobrickes, which means there are at most <strong>nm*nm</strong> paired biobrickes. The calculation of hamming distances between two sequences with length l could be done in <strong>O(l)</strong> time. Thus, the complexity of this algorithm is<strong> O(l*nm*nm | + | The third one is to calculate the hamming distances of each paired biobrickes. Since there are n device chains and each device contains at most m biobrickes. Therefore, there are at most nm biobrickes, which means there are at most <strong>nm*nm</strong> paired biobrickes. The calculation of hamming distances between two sequences with length l could be done in <strong>O(l)</strong> time. Thus, the complexity of this algorithm is<strong> O(l*nm*nm).</strong></p> |
<p align="justify" class="STYLE12"><br> | <p align="justify" class="STYLE12"><br> | ||
- | Hence, the time complexity of preprocessing part is <strong>O(nm)+O(n)+O( | + | Hence, the time complexity of preprocessing part is <strong>O(nm)+O(n)+O(l*nm*nm) =O(l*nm*nm)</strong>.</p> |
<p align="center"><br> | <p align="center"><br> | ||
<span class="STYLE9">Executing Part</span></p> | <span class="STYLE9">Executing Part</span></p> |
Latest revision as of 03:13, 18 October 2014
Time Complexity Analysis
The algorithms used in Biodesigner could be grouped into two categories: preprocessing part and executing part.
Preprocessing Part
The preprocessing part includes three algorithms. Here we assume there are n devices and each device contains at most m biobrickes.
The first one is to count the probabilities of two successive biobrickes, three successive biobrickes, four successive biobrickes and five successive biobrickes. Here successive biobrickes means the biobrickes are neighbors. Thus the number of two successive biobrickes, three successive biobrickes, four successive biobrickes and five successive biobrickes are n(m-1), n(m-2), n(m-3), n(m-4), respectively. Hash tables are used to store these successive biobrickes. Therefore, the probabilities could be obtained by scanning the device chains four times. Thus, the time complexity of this algorithm is O(2n(m-1)+3n(m-2)+4n(m-3)+5n(m-3))=O(nm).
The second one is to count the probabilities of the first biobrick of each device chain. This work could be done by scanning the first biobrick of each chain. Thus, the complexity of this algorithm is O(n).
The third one is to calculate the hamming distances of each paired biobrickes. Since there are n device chains and each device contains at most m biobrickes. Therefore, there are at most nm biobrickes, which means there are at most nm*nm paired biobrickes. The calculation of hamming distances between two sequences with length l could be done in O(l) time. Thus, the complexity of this algorithm is O(l*nm*nm).
Hence, the time complexity of preprocessing part is O(nm)+O(n)+O(l*nm*nm) =O(l*nm*nm).
Executing Part
The executing part contains two algorithms. Here we assume the given device chain contains at most m biobrickes.
The first one is to inquire the probabilities of two successive biobrickes, three successive biobrickes, four successive biobrickes and five successive biobrickes of a given device chain. Since all the data is stored in a hash table and the access of hash table needs O(1) time. Thus, the time complexity of this algorithm is O(m).
The second one is to store the given device chain. This work is done by store the two successive biobrickes, three successive biobrickes, four successive biobrickes and five successive biobrickes of the given device chain into the hash table. Thus, the time complexity of this algorithm is O(m).
Hence, the time complexity of executing part is O(m).
Note, we only need to execute the preprocessing part once. After the execution of preprocessing part, all the data is store in database. When using our software, users do not need to waste their time for the preprocessing part as it has been executed.
Experimental Results
In order to test the performance of our software, we collected about 3600 device chains from Registry of Standard Parts, and extracted the associate relationships among these chains. All these information are stored in our database.
Firstly, we tested the response time of different functions of our software. Table 1 shows the results of different query functions. The results are averaged from 10 independent runs. From the results, we can find that all the time is really short, less than 0.1 seconds.
Table 1 Response time for different functions
Function |
Response Time (sec.) |
Query of Biobrick information |
0.04 |
Query of Biobrick’s twins |
0.03 |
Ambiguous Query of Biobrick |
0.1 |
We also tested the response time of recommendation. Table 2 listed the results in terms of different number of given biobricks. Since our software recommend biobrick based on at most 4 biobricks’ information. Thus, the maximum number of given biobricks is 4. These times are really short, less than 0.4 seconds. Users could get the recommendation almost instantly. This could greatly reduce the time used for designing.
Table 2 Response time for recommending based on different number of biobricks
Number of Biobricks |
Response Time(sec.) |
0 |
0.3 |
1 |
0.19 |
2 |
0.14 |
3 |
0.27 |
4 |
0.35 |
Secondly, we tested the performance of recommendation. 10 device chains were randomly selected for this test. Table 3 shows the results, where the recommendation ratio is the ratio of the number of correctly recommended biobricks over the length of device chains, and the correctly recommended biobricks means these biobricks are in recommended lists. The average recommendation ratio is 75.4%. This ratio actually is really high, since only 24.6% of biobricks are not in the recommended lists.
Device Chain |
Correctly recommended biobricks |
Recommendation ratio |
BBa_R0040-BBa_B0034-BBa_J36835-BBa_J36837-BBa_K283009-BBa_B0010-BBa_B0012 |
BBa_B0034 BBa_J36837 |
71.4% |
BBa_R0010-BBa_B0030-BBa_K233307-BBa_I732005-BBa_E0040-BBa_B0010-BBa_B0012 |
BBa_B0030 BBa_K233307 BBa_I732005 BBa_E0040 BBa_B0010 |
85.7% |
BBa_J23118-BBa_B0030-BBa_K233306-BBa_I732005-BBa_E0040-BBa_B0010-BBa_B0012 |
BBa_K233306 BBa_I732005 BBa_E0040 BBa_B0010 |
71.4% |
BBa_R0040-BBa_B0034-BBa_K255000-BBa_B0010-BBa_B0012-BBa_R0040-BBa_B0034-BBa_E0030-BBa_B0010-BBa_B0012 |
BBa_B0034 BBa_B0010 BBa_B0012 BBa_R0040 |
80% |
BBa_J23119-BBa_B0034-BBa_C0040-BBa_B0034-BBa_E1010-BBa_B0010-BBa_B0012 |
BBa_B0034 BBa_C0040 |
85.7% |
BBa_J23106-BBa_B0034-BBa_C0012-BBa_B0034-BBa_C0062-BBa_B0012-BBa_B0011 |
BBa_C0012 BBa_B0034 |
71.4% |
BBa_K145150-BBa_B0034-BBa_K145151-BBa_B0034-BBa_E1010-BBa_B0012-BBa_B0011 |
BBa_K145151 BBa_B0034 BBa_E1010 BBa_B0012 BBa_B0011 |
71.4% |
BBa_R0040-BBa_B0034-BBa_C0179-BBa_B0010-BBa_B0012-BBa_R0079 |
BBa_B0034 BBa_B0010 |
66.7% |
BBa_R0040-BBa_K118012-BBa_I742151-BBa_J15001-BBa_K118002 |
BBa_I742151 BBa_J15001 |
60% |
BBa_I14032-BBa_B0034-BBa_C0062-BBa_B0010-BBa_B0012-BBa_R0062-BBa_J61127-BBa_K228000-BBa_B0010-BBa_B0012 |
BBa_B0034 BBa_C0062 |
90% |
We also use the new designed chains from BIT team to test the performance of our software. Table 4 shows the results. The averaged recommendation ratio is 66.9%. This ratio is smaller than the ratio tested above. The reason is that the new designed chains consist of some new biobricks that the database does not contain. These biobricks are hardly to recommend and effect the recommendation of next biobrick. However, this ratio still proves our software works well even when using new designed biobricks. We could believe that as more and more biobricks and device chains are added into our database, our software could recommend more and more efficiently and accurately.
Table 4 Performance of BioDesigner with BIT’s new designed device chains
Device Chain |
Correctly recommended biobricks |
Recommendation ratio |
BBa_B0034-BBa_C0071-BBa_B0010-BBa_B0012-BBa_R0071-BBa_B0030-BBa_E1010 |
BBa_B0010 BBa_B0012 |
57.1% |
BBa_J23100-BBa_B0034-BBa_C0179-BBa_B0010-BBa_B0012-BBa_R0079-BBa_B0030-BBa_E1010 |
BBa_B0034 BBa_B0010 |
62.5% |
BBa_R0079-BBa_B0034-BBa_E0040-BBa_B0010-BBa_B0012 |
BBa_B0034 BBa_E0040 |
80% |
BBa_R0079-BBa_B0030-BBa_E1010-BBa_B0010-BBa_B0012 |
BBa_E1010 BBa_B0010 |
60% |
BBa_J23100-BBa_B0034-BBa_C0179-BBa_B0010-BBa_B0012-BBa_R0079-BBa_B0034-BBa_E0040 |
BBa_B0034 BBa_B0010 |
75% |
BBa_B0034-BBa_C0171-BBa_B0010-BBa_B0012-BBa_R0071-BBa_B0030-BBa_E1010-BBa_B0010-BBa_B0012 |
BBa_B0010 BBa_B0012 |
66.7% |