Team:HFUT CHINA/Database&Algorithm.html
From 2014.igem.org
Database&Algorithm
In order to recommend the relevant biobricks, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Collaborative Filtering and Association Rules.
1.Collaborative Filtering
Collaborative Filtering recommendation has become a very popular system in message filter and information system. Different from traditional filters that analyze the content directly, Collaborative Filtering analyzes the habits of users and find the similar users for a given user. By dealing with the information of similar users, Collaborative Filtering will predict the users’ interest and recommend the relevant information.
Compared with traditional text filtering, the advantages of Collaborative Filtering are as follows: First it can filter the information that the machine is hard to analysis based on the text. Second it based on the concept which hard to express.
Collaborative Filtering can be divided into two kinds, item-based CF and user-based CF, which will be introduced as follows.
1.1 item-based CF
The main idea of Item-based CF is that for a given biobrick, the algorithm finds similar biobricks to recommend according to the similarity between two biobricks.
In our software, we use Hamming distance to measure the similarity of two biobricks. Hamming distance is named by Richard Wesley Hamming, which was first used in basic thesis about error detection and correction code. Today, hamming distance analysis is widely used in many areas including information theory coding theory, cryptography, etc.
The Hamming distance of two equal-length character strings is the number of different character in their correspondence position. In other words, it is the number of change that changing one alphabetic string to another. For example, the Hamming distance of 1011101and 1001001 is two.
In our software, we calculate the hamming distance of each paired biobrick sequence and use it as the similarity of these sequences.
1.2 User-based CF
The main idea of User-based CF is to find similar users by analyzing the previous behavior of different users and recommend biobricks based on the behavior of similar user.
Pearson correlation coefficient is used to measure the similarity of different users. The definition of Pearson correlation coefficient is as follows:
-
2. Association rule
Association rule is a widely used and well researched method for discovering relations between variables. When using association rule for recommendation, there are mainly two separate steps:
First, minimum support is applied to find all frequent itemsets in a database.
Second, these frequent itemsets and the minimum confidence constraint are used to form rules.
For the first step, we first collect all the device chains from Registry of Standard Biological Parts and then calculate the frequency of two successive biobrickes, three successive biobrickes, four successive biobrickes and five successive biobrickes. Here successive biobrickes means the biobrickes are neighbors and the frequency of a set of successive biobricks is named support. After the calculation of the frequency, we find all successive biobrickes that have support greater than the minimum support.
For the second step, the large sets of successive biobricks are used to generate the desired rules that have confidence greater than the minimum confidence. Here confidence a measure of the reliability of an estimate. It consists of a range of values (interval) that act as good estimates of the unknown population parameter.
Reference:
(1) LINDEN G, SMITH B, YORK J. Amazon.com recommendations item-to-item collaborative filtering [J]. IEEE Computer Society, 2003(1-2):76-80.
(2) METEREN R V. M. v. S. using content-based filtering for recommendation[C]. In CML/MLNET Workshop on Machine Learning and the New Information Age, Baecelona, 2000:47-56
(3) Badrul Sarwar, G.K., KONSTAN J, et al. Item based collaborative filtering recommendation algorithms[C]. Proceedings of the 10th International World Wide Web Conference, 2001.
(4) EKSTRAND M D, RIDER J T, KONSTAN J A. Collaborative filtering recommender systems [M]. Hanover: New Publishers Inc, 2010:81-173
(5) CHAN G K, ASGARPOOR S. Optimum maintenance policy with Markov processes[J] .Electric Power Systems Research, 2006(76):452-456.
(6) ARABIAN H H , ORAEE H ,TAVNER P J. Wind turbine productivity considering electrical subassembly reliability[J]. Renewable Energy, 2010(35) :190-197.
(7) Ge Haifeng, TOMASEVICZ C L, ASGARPOOR S. Optimum maintenance policy with inspection by semi-markov decision processes[C].2007 North American Power Symposium (NAPS):541-546.
(8) MCMILLAN D, AULT G W. Quantification of condition monitoring benefit for offshore wind turbines [J]. Wind Engineering, 2007, 4(31):267-28