Team:SJTU-Software/Database

From 2014.igem.org

Home
Software
- Overview
- Tutorial
- User Study
- Demo
- Download
Document
Requirement
Team
- Member

Database

Here, we have reconstructed the database of the current existing biobricks. All the necessary information available on the Registry are collected and used to construct our own database. In this part, the source of data, the way of data collection, database structure and some facts of the biobricks in the database are introduced.

Data collection

Data are collected from 5 different websites on the Registry and Google Scholar.
The data needed in the reconstructed biobrick database is available from 5 sources, with four of them from the websites of Registry of Standard Biological Parts and 1 of them from Google Scholar. Addresses of those sources for BBa_B0034 are listed in Table 1, and the part name could be replaced by any biobrick.

Name of Websites	Address
XML format	http://parts.igem.org/cgi/xml/part.cgi?part=BBa_B0034
Hard information	http://parts.igem.org/cgi/partsdb/part_info.cgi?part_name=BBa_B0034
Get part	http://parts.igem.org/partsdb/get_part.cgi?part=BBa_B0034
Experience	http://parts.igem.org/Part:BBa_B0034:Experience
Google scholar	http://scholar.google.com.cn/scholar?q=BBa_B0034

Table 1 Sources of the data in the biobrick database, using BBa_B0034 as an example

PERL scripts are composed to connect to those links and to pick out the information we need. In order to connect to those links, we need to understand the way how biobricks on the website are named. There exist two main ways to name the biobricks, which are illustrated in Table 2 and Table 3. “BBa” is used for most biobricks, names of which consist of 3 parts. The first part is “BBa_”; the second part is a capitalized letter; the third part is several digits, the number of which depends on the previous capitalized letter (possible combinations are listed in Table 2). “pSB” is specially used for plasmids, names of which consist of 4 parts. The first part is “pSB”; the second part is one digit ranging from 1 to 9; the third part is a single pattern or a combination of two patterns listed; the fourth part can be any number from 1 to 29.

The First Part of the Name	The Second Part of the Name	The Third Part of the Name (Number of Digits)	Examples
BBa_
	A	6	BBa_A340620
	B	4	BBa_B0034
	C	4/5/6	BBa_C0053; BBa_C10001
	E	4	BBa_E5504
	F	4	BBa_F2622
	G	4/5	BBa_G0011; BBa_G00500
	I	4/5/6	BBa_I10018
	J	4/5/6	BBa_J52100; BBa_J540013
	K	6/7	BBa_K374013; BBa_K1218016
	M	4/5	BBa_M1904; BBa_M31000
	P	4	BBa_P2007
	Q	5/6	BBa_Q200514
	R	4	BBa_R4037
	S	5	BBa_S01297
	T	4	BBa_T1009
	V	4	BBa_V1022
	Y	5	BBa_Y00100
	Z	4/5	BBa_Z0506; BBa_Z52935

Table 2 Rules in naming biobricks starting with “BBa_”

The First Part of the Name	The Second Part of the Name	The Third Part of the Name	The Fourth Part of the Name	Examples
pSB	1~9	A/C/E/G/K/N/Na/R/S/St/T/Tm/Z	1~29	pSB1K16
pSB	1~9	Combination of 2 letters above	1~29	pSB1K16

Table 3 Rules in naming biobricks starting with “pSB”

By substituting the names listed in Table 2 and Table 3 for the part name (BBa_B0034) in the addresses listed in Table 1, we can get access to the links about the information on biobricks. Data that are collected by PERL scripts from each website are listed in Table 4. More detailed description of those data are available in Table 5. PERL scripts used to collect those data are available on GitHub.

Name of Websites	Address
XML format	part id; part name; part short name; part short description; part type; part status; sample status; part results; part nickname; part rating; part url; part entered; part author; Sequences; Samples; References; groups; deep subparts; specified subparts; specified subscars; features; parameters; categories; twins
Hard information	DNA status; Group Favorite; Whether or not deleted; Used Times; Length of Documentation
Get part	Confirmed Times; Not Confirmed Details
Experience	Average Rating; Number of Comments
Google scholar	Number of Publication; URL of the most related publication

Table 4 Data collected from the 5 sources

Property	Description
part id	Official name of the part
part name	BBa_B0034
part short name	Part name without “BBa_”
part short description	The short description is usually a biological or technical descriptor of the part. It will show up on the part page
part type	This will state the part type/function, and will show up at the top of the part page, to the left of the DNA Status
part status	The status of the part based on the completeness of its documentation and characterization.system not currently in place
sample status	The status of the part's physical DNA (sample) in the Repository
part results	The experience status for a part, as documented by the part authors
part nickname	Nickname of the part
part rating	Whether there is a registry star for the part
part url	URL of main page for the part
part entered	Entered date of the part
part author	Author of the part
Sequences	Nucleotide sequence of the part
Samples	It’s not enabled now
References	It’s not enabled now
Groups	It’s not enabled now
deep subparts	All subparts that compose the subparts in the part
specified subparts	All subparts that compose the part
specified subscars	All scars between subparts
Features	Features of the part
Parameters	The parameters are submitter-decided, allowing for a degree of technical specification for the part.
Categories	Categories allow for a part to become content in automatically generated part tables, which is important in defining the organization of your part within the Registry, and specifically for the Catalog of Parts and Device
Twins	Two or more parts are twins if they have the same sequence
DNA status	States the DNA status of your part: Deleted, Planning, Sent, Available, etc. These statuses are generated by the Registry, so the user cannot edit them.
Group Favorite	One can choose if the part is a favorite of your team/group.
Whether or not deleted	Whether the part is deleted
Used Times	The number of times the part has been specified in composite parts in the Registry
Confirmed Times	The number of times the part sequence has been confirmed by users
Not Confirmed Details	Details of not confirmed facts
Average rating	The mean value of rating for the part given by users
Number of Comments	The number of comments from users
Number of Publication	The number of related results of the part on Google Scholar
URL of the most related publication	The most related result of the part on Google Scholar

Table 5 Description of attributes in Easy BBK database

Database Structure

The structure of the database is presented in Figure 1. The primary key in table “Main” is “part name”, with all the other tables linked to it by identical “part name”. There is no primary key in the tables except “Main” because biobricks could belong to more than one category or have more than one twins. Thus, the part name in the rest tables could be repeated for several times, depending on different biobricks.