Team:SJTU-Software/Database
From 2014.igem.org
Line 527: | Line 527: | ||
<p class="text" style="margin:3% 60px auto"><font face="Microsoft YaHei" size="3px" color="#FFFFFF">By substituting the names listed in Table 2.1.2 and Table 2.1.3 for the part name (BBa_B0034) in the addresses listed in Table 2.1.1, we can get access to the links about the information on biobricks. Data that are collected by PERL scripts from each website are listed in Table 2.1.4. More detailed description of those data are available in Table 2.1.5. PERL scripts used to collect those data are available on GitHub. | <p class="text" style="margin:3% 60px auto"><font face="Microsoft YaHei" size="3px" color="#FFFFFF">By substituting the names listed in Table 2.1.2 and Table 2.1.3 for the part name (BBa_B0034) in the addresses listed in Table 2.1.1, we can get access to the links about the information on biobricks. Data that are collected by PERL scripts from each website are listed in Table 2.1.4. More detailed description of those data are available in Table 2.1.5. PERL scripts used to collect those data are available on GitHub. | ||
</font></p> | </font></p> | ||
+ | |||
+ | <table border="1" style="margin:1% 60px auto;height:100px;width:80%;"> | ||
+ | |||
+ | <tr><th rowspan="1">Name of Websites<th colspan="0">Address | ||
+ | <tr><th>XML format<td>part id; part name; part short name; part short description; part type; part status; sample status; part results; part nickname; part rating; part url; part entered; part author; Sequences; Samples; References; groups; deep subparts; specified subparts; specified subscars; features; parameters; categories; twins | ||
+ | <tr><th>Hard information<td>DNA status; Group Favorite; Whether or not deleted; Used Times; Length of Documentation | ||
+ | <tr><th>Get part<td>Confirmed Times; Not Confirmed Details | ||
+ | <tr><th>Experience<td>Average Rating; Number of Comments | ||
+ | <tr><th>Google scholar<td>Number of Publication; URL of the most related publication | ||
+ | |||
+ | </table> | ||
+ | |||
+ | <p class="text" style="margin:1% 60px auto"><font face="SimSun" size="1px" color="#FFFFFF"> | ||
+ | Table 2.1.4 Data collected from the 5 sources | ||
+ | |||
+ | </font></p> | ||
</div> | </div> |
Revision as of 17:50, 15 October 2014
3. Database
Here, we have reconstructed the database of the current existing biobricks. All the necessary information available on the Registry are collected and used to construct our own database. In this part, the source of data, the way of data collection, database structure and some facts of the biobricks in the database are introduced.
3.1.1 Data collection
Data are collected from 5 different websites on the Registry and Google Scholar.
The data needed in the reconstructed biobrick database is available from 5 sources, with four of them from the websites of Registry of Standard Biological Parts and 1 of them from Google Scholar. Addresses of those sources for BBa_B0034 are listed in Table 2.1.1, and the part name could be replaced by any biobrick.
Name of Websites | Address |
---|---|
XML format | http://parts.igem.org/cgi/xml/part.cgi?part=BBa_B0034 |
Hard information | http://parts.igem.org/cgi/partsdb/part_info.cgi?part_name=BBa_B0034 |
Get part | http://parts.igem.org/partsdb/get_part.cgi?part=BBa_B0034 |
Experience | http://parts.igem.org/Part:BBa_B0034:Experience |
Google scholar | http://scholar.google.com.cn/scholar?q=BBa_B0034 |
Table 2.1.1 Sources of the data in the biobrick database, using BBa_B0034 as an example
PERL scripts are composed to connect to those links and to pick out the information we need. In order to connect to those links, we need to understand the way how biobricks on the website are named. There exist two main ways to name the biobricks, which are illustrated in Table 2.1.2 and Table 2.1.3. “BBa” is used for most biobricks, names of which consist of 3 parts. The first part is “BBa_”; the second part is a capitalized letter; the third part is several digits, the number of which depends on the previous capitalized letter (possible combinations are listed in Table 2.1.2). “pSB” is specially used for plasmids, names of which consist of 4 parts. The first part is “pSB”; the second part is one digit ranging from 1 to 9; the third part is a single pattern or a combination of two patterns listed; the fourth part can be any number from 1 to 29.
The First Part of the Name | The Second Part of the Name | The Third Part of the Name (Number of Digits) | Examples |
---|---|---|
BBa_ | ||
A | 6 | BBa_A340620 |
B | 4 | BBa_B0034 |
C | 4/5/6 | BBa_C0053; BBa_C10001 |
E | 4 | BBa_E5504 |
F | 4 | BBa_F2622 |
G | 4/5 | BBa_G0011; BBa_G00500 |
I | 4/5/6 | BBa_I10018 |
J | 4/5/6 | BBa_J52100; BBa_J540013 |
K | 6/7 | BBa_K374013; BBa_K1218016 |
M | 4/5 | BBa_M1904; BBa_M31000 |
P | 4 | BBa_P2007 |
Q | 5/6 | BBa_Q200514 |
R | 4 | BBa_R4037 |
S | 5 | BBa_S01297 |
T | 4 | BBa_T1009 |
V | 4 | BBa_V1022 |
Y | 5 | BBa_Y00100 |
Z | 4/5 | BBa_Z0506; BBa_Z52935 |
Table 2.1.2 Rules in naming biobricks starting with “BBa_”
The First Part of the Name | The Second Part of the Name | The Third Part of the Name | The Fourth Part of the Name | Examples |
---|---|---|---|---|
pSB | 1~9 | A/C/E/G/K/N/Na/R/S/St/T/Tm/Z | 1~29 | pSB1K16 |
pSB | 1~9 | Combination of 2 letters above | 1~29 | pSB1K16 |
Table 2.1.3 Rules in naming biobricks starting with “pSB”
By substituting the names listed in Table 2.1.2 and Table 2.1.3 for the part name (BBa_B0034) in the addresses listed in Table 2.1.1, we can get access to the links about the information on biobricks. Data that are collected by PERL scripts from each website are listed in Table 2.1.4. More detailed description of those data are available in Table 2.1.5. PERL scripts used to collect those data are available on GitHub.
Name of Websites | Address |
---|---|
XML format | part id; part name; part short name; part short description; part type; part status; sample status; part results; part nickname; part rating; part url; part entered; part author; Sequences; Samples; References; groups; deep subparts; specified subparts; specified subscars; features; parameters; categories; twins |
Hard information | DNA status; Group Favorite; Whether or not deleted; Used Times; Length of Documentation |
Get part | Confirmed Times; Not Confirmed Details |
Experience | Average Rating; Number of Comments |
Google scholar | Number of Publication; URL of the most related publication |
Table 2.1.4 Data collected from the 5 sources
3.1. 2 Database Structure
The structure of the database is presented in Figure 3.1. The primary key in table “Main” is “part name”, with all the other tables linked to it by identical “part name”. There is no primary key in the tables except “Main” because biobricks could belong to more than one category or have more than one twins. Thus, the part name in the rest tables could be repeated for several times, depending on different biobricks.
Figure 3.1 The structure of the reconstructed biobrick database