Team:UCSD Software/Project

From 2014.igem.org

Revision as of 02:56, 18 October 2014 by Kspuri (Talk | contribs)


Description

Human Practices

Validation

We first validated the SBiDer application by searching for circuits described in the literature sources we used to construct the SBiDer database. As expected we were able to successfully retrieve all the circuits used to for the database. We also used this set of devices during the development of our web application; minimally our application should be able to find known circuits used to construct our database. The full list is below:
Inputs Outputs
  LacI,IPTG                gfp
  OHHL,luxR                gfp
  araC,Lara                yfp
  na                araC
  tetR,aTc                yfp
  na                tetR
  lasR,PAI-1                yfp
  na                lasR
  aTc,tetR                yfp
  na                araC,tetR
  na                araC,lasR
  lasR,                PAI-1
  na                tetR,lasR
  aTc,tetR                cI
  araC,Lara                cI
  lasR,PAI-1                cI
  tetR,aTc                cI
  tetR,aTc                gfp
  cI                LacI
  LacI,IPTG                cI
  LacI,IPTG                tetR
  LacI,IPTG                cI,gfp
  tetR,aTc                LacI
  LacI,IPTG                tetR,gfp
  LacIq,IPTG                gfp
  na                lacIq
  araC,Lara                gfp
  na                luxR
  LuxR,AHL                gfp
  LacIq,IPTG                hrpR
  LuxR,AHL                hrpR
  LacIq,IPTG                cI
  hrpR,hrpS                gfp
  hrpR,hrpS                cI
  Sal                supD
  na                nahR
  tetR                gfp
  araC,Lara                T7ptag
  T7ptag,supD                gfp
  AHL,luxR                lacI
  AHL,lacI                rfp
  IPTG,lacI                ecfp
  na                lacI
  IPTG,lacI                rfp
  IPTG,LacI                gfp
  IPTG,LacI                lacI
  araC,Lara                lacI
  tetR,aTc                luxR
  LacI,IPTG                AHL_luxR
  araC,Lara                AHL_luxR
  AHL,luxR                lacZalpha-ccdB
  aTc,                tetR
  LuxR,                PAI-2
  lexA                gfp
  lacI,IPTG                cI
  lacI,IPTG                yfp
  araC,Lara                rfp
  lacI,IPTG                rfp
  luxR,AHL                rfp
  tetR,aTc                rfp
  tetR,aTc                invF
  araC,Lara                sicA
  tetR,aTc                mxiE
  araC,Lara                ipgC
  ipgC,mxiE                rfp
  tetR,aTc                exsDA
  luxR,AHL                exsC
  exsC,exsDA                rfp
  exsC,exsDA                invF
  ipgC,mxiE                sicA
  LacI,IPTG                mxiE
  LacI,IPTG                luxR,luxI
  luxR,AHL                lacZalpha-ccdB
While many circuits are contained within the SBiDer database, there were new circuits published in the literature that we could not include in our database due to time constraints. However, supposing that circuits contained within the SBiDer database represent the majority of mechanisms by which synthetic circuits can be constructed (which they do), then SBiDer can predict novel circuits that may eventually be constructed and published. Validating predictions of novel circuits described by our database constituted a search of relevant publications. The list of relevant publications that may have explored and validated at least one of our predicted circuits is described by the set of all papers citing more than 1 paper cited by our database. SBiDer gathered circuitry from n=17 publications. The most recent circuit contained in our database was published in 2012. Interestingly, we found a SoxR based circuit recently validated and published by the Siedler et al in 2014 in our database. We are excited to find that we have predicted novel circuits that are relevant to active research projects that have been published and submit this as justification of the validity of SBiDer’s capacity to predict novel genetic circuitry.

Supported Standards

Synthetic Biology Open Language

Synthetic Biology Open Language (SBOL) Visua; is an open source graphical notation that allows the user to communicate sequences of genetic parts visually using a set of standard symbols which represent common features of operons. The specific symbols used in our application are the promoter, coding sequence, and terminator. A promoter is a DNA segment that regulates the transcription of a gene. There are two types of promoters that appear in the database: regulated and constitutive. A regulated promoter must be triggered, usually by a small molecule. A constitutive promoter is always activated. In this case, both regulated and constitutive promoters are represented the same way: with a bent, right-facing arrow. Coding sequences are DNA regions that produce some protein when transcribed and are regulated by promoters, and are represented by elongated pentagonal arrowheads. Lastly, there are terminators, which are DNA segments that signal for transcription to cease, and are represented with capital T. Promoter, coding sequence, and terminator symbols which appear on the same line are all part of the same operon. To indicate sequences that are oriented in the 3’ to 5’ direction, arrows point in the opposite direction. The symbols are generated using a language called Pigeon[link], which is specific to SBOL Visual.

Using SBOL Visual, it is possible for us to communicate the exact structure of the design generated by the SBiDer application, thereby enhancing user understanding. This way, the user can design and visualize their desired output in silico, prior to wet lab production.

Systems Biology Markup Language

Systems Biology Markup Language (SBML) captures qualitative models using standardized syntax and semantics established by the SBML community and developers. By specifying the "Level”, "Qualitative Species”, and "Transitions" of a qualitative model, practitioners are capable of conveying the entirety of their model to a large community without sacrificing any nuance in their modeling of a biological network. Level is an integer value that refers to the current state of the network entities that are being model [ref]. In our Petri-Boolean model, network entities, chemical species and operons, have a basal level of 0 with a maximum capacity of 1. Qualitative Species refer to entities that are compartmentalized and characterized by the qualitative influence of their components; compartments allow one to pool entities of similar composition [ref]. In our Petri-Boolean model, chemical species and operons are referred to as qualitative species with species and plasmid as their respective compartments. Transitions alter the level of entities in a network at given conditions by a fixed amount such conditions involve the fluctuation between levels falling below or above an established threshold [ref]. Transitions in our network govern the binary state of its entities thus a network entity can assume a level of 0 or 1 depending on current state conditions. By converting every network entity, qualitative species or transition, in our database to its corresponding SBML description we not only make our qualitative model accessible to the public but also our entire database.

Network Analysis

Gene circuit design often involves the laborious extension of existing circuitry through the development of new devices. Device development is a taxing biological endeavor to which SBiDer offers an alternative. SBIDer facilitates circuit design by integrating knowledge across existing genetic modules. We provide evidence for an increase in functional motifs relative to the circuitry input into our database by examining common network attributes and executing more sophisticated network analysis.

In analyzing our database, we demonstrated and quantified the increased number of genetic circuits relative to the input circuits. Connectivity served as a preliminary indicator of the increase in potential circuits. We compared the connectivity, measured by “in-degree” and “out-degree,” of the input circuitry to the degree measure of the integrated database. The increase in degree indicates a combinatorial increase in potential circuits.




This increase is observed as the rightward shift of the In-Degree(v) and Out-Degree(v) distributions. The degree comparison between the input circuitry and the database demonstrated an increase in modulators and modulations in the database. Theoretically, every in-degree increase represents a novel modulation of that node, every out-degree increase represents a novel opportunity for that node to modulate another. The increase in modulators and modulations is important because every combination of these novel inputs and outputs constitutes a new circuit. Further, each ancestor and descendant of a node increase the count of novel circuits exponentially, as a function of the degree. In addition to analyzing the connectivity of our network, we implemented network motif analysis techniques in attempt to identify topological motifs.

Topological motifs can map to functional motifs which are the subunits of genetic circuitry [Alon, 2007]. We considered this correspondence and applied CytoKavosh [Masoudi-Nejad et.al., 2012] to identify topological motifs that occurred significantly relative to a random graph. CytoKavosh is a visual implementation of the Kavosh motif finding algorithm [Kashani, et.al, 2009] constructed on the Cytoscape platform. We report an increase in the number of topological motifs after analysis of the input circuitry. The bi-fan network motif 1 (Kavosh motif 204) showed a 1.98 fold increase from the input circuitry to the integrated database. In our most extreme cases, motifs 540 and 16920 (not included in the graph) had a 7.89 and 6.75 fold increase, respectively. The cumulative increase in fold change across discussed motifs is 24.57. Increases in the counts of selected motifs are shown below:

A display of counts of selected motifs, grouped by motif IDs generated by CytoKavosh

A comparison of CytoKavosh analyzed motifs between the input circuitry and the databases shows a clear increase in the number of known motifs. The observed increase in network motifs validates our expectation that integrating related circuits would increase their potential.

CytoKavosh ID: Motif 6

CytoKavosh ID: Motif 28

CytoKavosh ID: Motif 30

CytoKavosh ID: Motif 34

CytoKavosh ID: Motif 36

CytoKavosh ID: Motif 74

CytoKavosh ID: Motif 154

CytoKavosh ID: Motif 204

CytoKavosh ID: Motif 402

CytoKavosh ID: Motif 540

CytoKavosh ID: Motif 648

CytoKavosh ID: Motif 1080

CytoKavosh ID: Motif 16920

CytoKavosh ID: Motif 526864

CytoKavosh ID: Motif 540940

Genetic circuits are often difficult to engineer, requiring months to design, build, and test each individual genetic device involved in the circuit. SBiDer, a web tool developed by the UCSD Software iGEM team, will leverage existing devices to construct a database with consideration for the function of each device interpreted as boolean logic. The data can be queried by the user through SBiDer's visual interface to explore circuit designs. The displayed circuit's literature reference, characterization data, and images of included devices can be viewed through the built-in table. Basic validation of the circuit performance is also provided within in the interface. SBiDer's web of information can be expanded through user-generated additions to the database to improve the efficiency of the application and the accuracy of the models.

List of definitions for terms used in various sections of this page:

  • Nodes by context
    • Search Algorithm
      • Operon node: A combination of the input transition, operon, and output transition nodes as described in the Petri Net model.
    • SBiDer Petri Net Model
      • Operon node: A reciever and producer of species through input and output transitions.
    • Web Application
      • Operon node: A set of genes that are regulated by DNA sequences known as cis or trans regulating elements. Under the sbider Petri net adaptation operons are places
      • Species node: A node that contains only the species information which is similar to its usage in the algorithm.
      • Transition node (see Input Transition and Output Transition): Graphical representation of the Petri Net Transitions.
  • Input (for search)
    • A set of desired initial species to start the search algorithm.
  • Output (for search)
    • A set of desired final species produced by the search algorithm.
  • Paths:
    • Operon paths:
      • composed of operon nodes as defined within the algorithm representation of the network and species nodes. The network condenses to a directed tree graph from a Petri Net representation.
    • Petri Net paths:
      • The cyclic movement from a place (species) to an input transition to another place (operon) to an output transition to an ultimate place(species).
  • Petri net:
    • Consists of operon nodes, species nodes, and transitions. All logic required to activate an operon is encompassed within the transition. Species nodes always point to an input transition, which points to one operon, which points to a singular output transition, which points to a species node.
  • Places
    • Conditions to be met in order to enable a transition in the Petri Net model.
  • Transitions
    • Activated events that allow movement from Place to Place in a Petri Net model.
  • SBML
    • a free and open interchange format for computer models of biological processes, which is based on the XML standard. SBML is useful for models of metabolism, cell signaling, and more. It continues to be evolved and expanded by an international community.
  • SBOL
    • Synthetic Biology Open Language Visual is an open source graphical notation that allows the user to communicate sequences of genetic parts visually using a set of standard symbols which represent common features of operons.
  • Plasmids
    • The structure of bacterial DNA. Plasmids are circular, closed loops of DNA that contain functioning genes used by the organism.
  • Species
    • Any chemicals and stimuli affecting gene production. Under the SBiDer Petri net adaptation species are places.
  • Input transitions
    • Species-regulated regions of the DNA which are formalized into boolean logic operations under the sbider Petri net adaptation.
  • Operons
    • are a set of genes that are regulated by DNA sequences known as cis or trans regulating elements.
  • Output transitions
    • Describe the events and species produced after operon activation namely, RNA and protein production.

Future Directions

Database - Integrating NDEx

NDEx is a breeding ground for network sharing, discussion, analysis and storage. By acting like the Facebook of networks, NDEx serves a medium for the discovery, sharing and discussion of networks. Networks are commonly used for their ability to capture the biology as a system of particular molecular interactions with interaction types varying between the type of networks built by scientist. The discussion of networks offered by NDEx provides a means for researchers to share their ideas behind network construction and analysis as well as sharing a copy of the network. NDEx also provides what is know as provenance history which means that history of the network construction is saved and the development of the network can be seen. As a means of further connection the biological network community NDEx offers an API and has plans for further integration with the network visualizer, Cytoscape (www.ndexbio.org).

SBiDer was developed with the intention to involve the biology community with the SBiDer Network. Discussion of genetic circuits is imperative for the proper use of a circuit as well as to give the proper context of the circuit. NDex would give a discussion board to the SBiDer network where biologist can discuss topics regarding optimal genetic circuits, the potential to experimentally create new circuits from the existing network or a range of many other topics. As said before, genetic device building is not an easy task and provenance history provides useful backtracking to understand the development of genetic circuit networks and due to the ability of network sharing it becomes possible for the SBiDer network to exist in different versions with the purpose of displaying genetic circuit networks with specific functionalities. NDEx API functionality then provides the means to create tools with which SBiDer networks can be better analyzed.

Traversal

SBiDer’s three main pillars - database, search algorithm, and web application - intertwine effectively and make SBiDer a robust tool for the synthetic biology community. Furthermore, the database, search algorithm, and web application are modularly developed so that each component can be easily updated, extended, and optimized by the global synthetic biology community. In the development, there was a focus on minimizing the barrier of collaboration by using an open source platform and generating a comprehensive usage and implementation documents. SBiDer was developed on the consideration to encourage the synthetic biology community to collaborate and drive the field forward. In the course of the development, various tools that could further develop SBiDer were discovered. In future iterations of SBiDer, the following functionalities should be considered for extending SBiDer:

Web Ontology Language (OWL)

What OWL is

The Web Ontology Language (OWL) is a family of knowledge representation languages or ontology languages for authoring ontologies or knowledge bases (W3C). The data described by an ontology in the OWL family is interpreted as a set of "individuals" and a set of "property assertions" which relate these individuals to each other. An ontology consists of a set of axioms which place constraints on sets of individuals (called "classes") and the types of relationships permitted between them. These axioms provide semantics by allowing systems to infer additional information based on the data explicitly provided.

Integration of OWL into SBiDer

SBiDer can describe all of the components in the database using OWL by creating a unique ontological hierarchy that suits the field of synthetic biology. Using OWL relationships, SBiDer can better connect its components through more complex fashion that better captures the biological reality underlying synthetic biology. For example, an ontology describing components used by SBiDer, or individuals in the semantics of ontology, might include axioms stating that a "activatePromotor" property is only present between two individuals when "isBiochecmicalSpecies" is also present, and individuals of class "activatePromotor" are never related via "isBiochemicalSpecies" to of the "transcribeRNA" class. If it is stated that the individual green fluorescent protein (GFP) is related via "produceLight" to the individual yellow fluorescent protein (YFP), and that GFP is a member of the "isProtein" class, then it can be inferred that YFP is not a member of "isPromotor" class. In this fashion, OWL allows the use of more sophisticated relationships amongst database components. Furthermore, there already exists various ontological hierarchy optimized for biology such as genetic interaction ontology, protein interaction ontology, and cancer genomics ontology. And OWL allows potential interaction of multiple ontologies. So, the integration of OWL into SBiDer allows SBiDer to potentially interact with many other aspect of biology such as genetic interaction, protein interaction, and RNA processing. As a result of OWL integration, SBiDer can represent the biological reality of synthetic biology in a much more complex, sophisticated, and realistic manner.

Factor Graph

What Factor Graph is

In probability theory and its applications, a factor graph is a particular type of graphical model, with applications in Bayesian inference, that enables efficient computation of marginal distributions through the sum-product algorithm (H. Loeliger). Factor graph can capture a system as Markov networks or Bayesian networks, and represent mathematical relationships of elements of a system.

Integration of Factor Graph into SBiDer

SBiDer currently implements its network representation of the database as a Petri-Net, which allows modeling of our complex biological system. SBiDer can also capture the network using factor graph, which provides additional statistical tools for analyzing, interpreting, and prediction of the behaviour of our network. Integration of the factor graph can also provide us with machine learning tools to further advance our tools quantitatively and qualitatively.

Web App

The current SBiDer web tool has been constructed with consideration for expansion in all its components from the structure of the source code to the allowance of users to contribute to the database. It is the latter improvement that allows SBiDer to become more efficient as time progresses. As more information becomes available, the SBiDer application can begin to transition from a qualitative description of how its proposed genetic circuits behave to a more quantitative model.

Web Application

Framework of the SBiDer Interface

Structure Overview

SBiDer was developed off of CyNetShare, a HTML5 application designed to make utilizing and visualizing CytoscapeJS graphs simple and intuitive through use of the javascript library AngularJS. The object of the SBiDer web tool is to visually present genetic circuits and possible interactions between them in addition to providing relative information about each device’s construction and function through reference material. Images of devices and image data, in Systems Biology Markup Language (SBML), are provided by a Synthetic Biology Open Language (SBOL) visual rendered through use of the popular PigeonCad for intuitive understanding and universal sharing. Users are also provided with the PubMed ID’s of all genetic devices so that they may be able to access and verify entries information in the PubMed journal service.

Visualizing the Network

CytoscapeJS is a widely used visualization library written in Javascript that presents complex relationships in more easily understood figures of nodes connected by edges. The library’s popularity and supportive community make it an ideal framework from which the SBiDer team can reach a large audience of synthetic biologists. Characterizing information about the genetic devices can be easily stored inside CytoscapeJS’s nodes as data. Unfortunately, manipulating, editing, and updating this data autonomously without further means poses a difficult challenge that is resolved with the use of AngularJS and CyNetShare.

Dynamic Interface and Data Handling

The large portion of the SBiDer framework relies on the data manipulation and two-way databinding provided by the AngularJS javascript library which provides the interactivity to the web application. Key to this is the two-way binding that allows information to be changed through the User Interface (UI) and updated between the scripting code and html without need to constantly resync data with the server, allowing for quicker response times for users and cleaner source code for further development. The ability to pass data from the UI to the SBiDer application also simplified the procedure for User-generated entries to the SBiDer database. The major alternative library was JQuery, an older and popular method of achieving a dynamic interface that doesn’t have the crucial feature of two-way databinding. Data is passed from server to client through a JSON string which is parsed by AngularJS for the SBiDer application to visualize the network using CytoscapeJS as well as updating the web page without need to reload the window unlike when employing JQuery. Additionally, AngularJS is compatible with many JQuery functionalities which allows for a combination of use for both libraries with negligible drawbacks. Effectively, AngularJS allows SBiDer to effectively and dynamically communicate with CytoscapeJS and other components of the application without the clutter of JQuery’s syntax.

Modular Interface

CyNetShare is a new innovation from the creators of CytoscapeJS which is used to share visualizations made with CytoscapeJS without the need to manually exchange files, instead rendering the files through encoded url’s. CyNetShare provides the basic framework for SBiDer as it has much of the core foundation in its source code including the CytoscapeJS visualization, table of node information, and a basic user interface layout and styling. SBiDer developed upon this framework to include the ability to retrieve results from the SBiDer database, display images of genetic devices, and easily access and download data.

Other Utilized Technologies

Apart from the aforementioned technologies that were used to create SBiDer itself, there were two other major technologies/languages involved, namely Boostrap, to design the website & the wiki, and Java, for the server.
Bootstrap is a front-end framework that contains several HTML & CSS-based design templates for various aspects of web pages. It's an easy to use framework that's used widely all over the world since it makes websites look more modern, cleaner and aesthetically pleasing.
The language chosen for the server was Java since it's easier for other developers to use, is extremely common and is compatible with a lot of languages including but not limited to Python, JavaScript and SQL. In essence, all of the languages that we made use of are very well documented, modern and easy to use languages.

The Search Algorithm

Introduction to the SBiDer Search Algorithm

Purpose of the Algorithm

SBiDer is fundamentally constructed on a manually curated database of existing genetic devices, or operons. This SBiDer database stores operons, species, and most importantly the biochemical interactions of these elements. Also, the database can be represented as an operon interaction network via species. Using this network representation of the database, SBiDer’s search algorithm searches for feasible operon paths connecting a set of species to another set of species. As a result, given a set of input species and a set of output species, the search algorithm returns a subnetwork of operon paths connecting the input species to the output species. Each path within the subnetwork is an operon path that can be used to produce the output species from the input species. In other words, placing operons from a returned operon path as well as the input species into a system can produce the output species.

Biology Captured by the Algorithm

The search algorithm conducts a complex, effective, and robust operon path search over the network representation of the database, ultimately generating single or multiple operon paths connecting a set of input species to a set of output species. A resulting path represents a system that can produce the output species given the input species. Also, each path attempts to capture biological reality by considering two major mechanisms of operon activation (detailed description below).

Robustness and Modularity of the Algorithm

Due to the current number of operons in the database, the search algorithm should scale with the increasing size of the database, which will expand as the synthetic biology community adds more species, operons, and plasmids to the database (see complexity analysis of the algorithm is located below). In addition to this robustness, the search algorithm is easily modifiable since it has been implemented independently from the database and the SBiDer web application. Therefore, the search algorithm can be applied to a broad range of networks. Furthermore, the search algorithm was developed on an open source platform to encourage global collaboration in further optimizing the search algorithm and minimizing the barrier for algorithm improvement. As a result, the database as well as the search algorithm will be easily, effectively, and accurately expanded, extended, and optimized by the global synthetic biology community - together.

Algorithm

Overview of the Algorithm

SBiDer's search algorithm is fundamentally based on multiple breadth first searches (BFS) on the SBiDer database. From a set of input species, the algorithm searches for any operon that can be activated by the input species (see Tutorial for how to format the input query). If such an operon is found, the output species of the activated operon and/or other existing species in the system (details below) is used to search for the next operon. In the process, the search algorithm constructs an operon paths by successively adding activated operons to the path. There are two ways of activating an operon: linear successional activation and nonlinear successional activation (more details below), and the search algorithm can create an operon path using either or both of the activation methods. This search algorithm continues until either: the search algorithm concludes that there is no operon path that can lead to the production of the output species; or an operon path producing the output species is found.

Linear Successional Activation Search (LSAS)

The search algorithm uses linear successional activation search (LSAS) to generate an operon path in which the output species of a preceding operon produces all of the required input species for the succeeding operon. LSAS extends an operon path by adding only operons that can be activated by the species produced by the last operon in the path. Thus, for each potential activated operon by the input species, LSAS uses a tree data structure in searching an operon path. From an input species (or multiple input species with an AND relationship), LSAS builds a tree, and if an output species is found in the process of growing the tree, the path from the root operon to the leaf operon that has produced the output species is returned as a resulting operon path. Finally, this resulting set operon paths is used to construct a Petri net path including all of the input species, input transitions, input transition logic, operons, output transitions, output transition logic, and output species.

Nonlinear Successional Activation Search (NLSAS)

An operon may not produce all of the input species required for an activation of another operon. Yet, multiple operons may together produce all of the required input species for the other operon. Here, each of the activating operons produces some, but not all, of the required species for the activated operon. The search algorithm captures such operon activation mechanism using nonlinear successional search (NLSAS) in building an operon path. As LSAS, NLSAS also uses a tree data structure, but appends an operon whose input species, or activation requirements, are produced by any of the preceding operons in the path. While NLSAS assumes no degradation of the species in the system, it allows for a wider range of operon sets that could produce our desired output from our given input. As a result, each resulting operon path can be linear or nonlinear. Since a NLSAS result captures operon activation in which a single or multiple preceding operons produce species required for the activation of a succeeding operon, LSAS result is a subset of a corresponding NLSAS result.

Overview of LSAS and NLSAS

Overall, LSAS generates a single or multiple operon paths in which a preceding operon directly activates a succeeding operon, and NLSAS paths in which any of the preceding operons can activate a succeeding operon. In general, a resulting LSAS operon path includes operons mined from same or similarly conducted experimental protocols, where the operons are optimized to coexist in a system. As a result, LSAS can return well optimized operon paths and interactions, which may be used in biochemical reactions with higher confidence. The SBiDer web application can further facilitate the use of such operons paths by providing literature and other published resources. NLSAS, on the other hand, can generate an operon path by including operons from dissimilar experimental protocols. So, using NLSAS can potentially lead to the discovery of novel operon paths and thus new biochemical systems. By providing LSAS and NLSAS variations of the search algorithm, SBiDer assists a user in capturing the biological reality underlying an operon activation as accurately as possible, and provides a means to evaluate, analyze, and apply the results.

Complexity Analysis

The SBiDer network is based upon the Petri Net model, which allows for the capturing of boolean logic. The linearity of the network representation of the database is such that one input transition leads to one operon, which leads to one output transition, and one list containing all species produced by that operon, as shown in Figure 1.

Figure 1 - Petri Net representation of an input species node, an input transition, an operon node, an output transition, and a species node consisting of only the species produced by the current operon respectively.This is the representation used for the construction of the SBiDer database since they allow for the incorporation of boolean logic.

This entire structure can be collapsed into one node since there is no branching, and allows us to represent the network as a directed graph, as shown in Figure 2. By condensing the network in this fashion, complex searches can be conducted in a more efficient manner.

Figure 2 - Operon representation of the network where the input and output transitions have been collapsed into one operon node to decrease the complexity of the search algorithm. This is possible because an input transition always points to one operon and one output transition always points to one species node.

LSAS

Assume the network has m transitions, with depth d and breadth b from a specified root, and V vertices which represent the aforementioned condensed nodes (centered on operons), and E directed edges connecting said vertices.

The SBiDer search algorithm is a modified version of breadth first search where at each transition m that is activated by the input species set a breadth first search for the output species set is performed. For a standard breadth first search, the complexity is

BFS Complexity = O(|b|^|d|)

This is another way of saying that in the worst case, every vertex and every edge is explored, which can be rewritten in terms of the number of vertices and edges. Since all edges have weights of one, V and E will always be positive.

BFS Complexity = O(|V| + |E|) => O(V + E).

For this standard BFS, assuming a large network where every node is connected to every other node, for V significantly large, V ≈ V - 1, meaning the number of edges searched after the selection of each successive node is approximately equal to V, meaning the V nodes are searched V times. Additionally, consider that the total number of edges (E) in a completely connected graph is bound by V^2. In a completely connected graph (with no self or duplicate edges) with V nodes, the total number of edges is (V-1) + (V-2) + (V-3) + … + 2 + 1. It has been shown that the (V-1)th partial sum of this divergent series sums to [V*(V-1)]/2, which is bound by V^2. Therefore the upper bound for the time complexity for a single breadth first search from a single activated transition from an input species is O(V + E) ≤ O(V+V^2) => O(V^2).

For our network, a breadth first search of time complexity O(V^2) happens at every activated transition m.
Therefore, the LSAS complexity = O(mV^2)

Paths generated using the LSAS search would resemble Figure 2 where a species node directly activates an operon, which produces a singular species node. The LSAS algorithm would not be able to capture the logic encompassed within Figure 3 where multiple species nodes are required to activate the middle input transition.

NLSAS

For the NLSAS, a similar tree from LSAS is created for each activated input transition by the input species. However, the difference here lies in the construction of the path from the input species to the output species. A special traceback algorithm is used to tie the output species to the input species creating a path that may resemble a subnetwork rather than a traditional linear path. This traceback algorithm is where the true complexity of this algorithm arises from.
Similar to LSAS, this algorithm also begins by constructing a subnetwork of paths that can lead to the output of interest from a specific input. This generates a list of edges to be used in the traceback algorithm. Assuming every edge could be explored, the algorithm runs in O(E) time. As shown for the LSAS algorithm, O(E) <= O(V^2), and also returns a list that will be explored also in O(E) size or O(V^2). Again, a new list is constructed for each potentially activated transition, thus giving this entire first step a run time complexity of O(m*V^2).

Now, for each of the O(m) list of edges constructed from the last step, a traceback algorithm is run for this path from the output species set to return a subnetwork of operons that could form a system that leads to the production of the desired output. From the desired output species, the traceback algorithm searches through the list of edges E to find the transitions for each operon contained in the path that produces the output species of interest. Since every transition in the set E must be checked, the algorithm runs in O(E) or O(V^2) time. At worst, every operon must be checked, resulting in a complexity of O(m*V^2) for this step.

Now, in order to determine which species were required to activate the operons now connected to the output species, the database of transitions is searched to identify which transitions will have to be activated. This search happens across an O(m) space for each of the O(V) operons that could be activating the output. Therefore, this next time step happens in O(mV) time. The first two steps together now will happen as O(m*(V2+mV)). At the end of these steps, the path now has O(m*V) species.

However, it is important to note here that at the step when possible operons are being explored, the search will have a time complexity of O(E), and will never happen more than O(m*V) times, which is the bound for the number of unique species in the system. For the following step, the search for species activating all operons on this increment will have a time complexity of O(m), but it will never happen more than O(V) times, which is the bound for the number of unique operons in the system. Therefore, for a full iteration of these two steps in the traceback algorithm, it will be bounded by the time complexity of O(V^2*mV + m*V). This traceback step will happen until the depth of the entire path generated from the LSAS step has been explored, which is bound by O(V). Thus, the time complexity for the traceback of a single path is O(mV^4+mV^2). Then complexity of the path generation is added for a complexity of O(mV^4+mV2+V^2). Now consider that this path generation and traceback happens for every single activated transition by the input species, which gives O(m*(mV^4+mV^2+V^2))=>O(m^2*V^4).

Figure 3 - Operon representation of a simple circuit and a potential output from NLSAS search algorithm where species from multiple operons are required to activate the next operon. This path could not be captured by the LSAS algorithm because the incorporation of multiple species to activate one operon is not possible.

Figure 4 - Petri Net representation Figure 3 where multiple species nodes are required to activate one input transition. First two species nodes activate each of the three initial input transitions, which in turn activate three separate operons. The output of each operon is an output transition which points to a species node that contains all output species produced by that operon. The three species nodes in the center of the graph trigger the final input transition, triggering the final operon, and producing the final species node. This expansion would add an additional layer of complexity to the NLSAS algorithm that is unnecessary given the previously explained linearity of the operon nodes.

References

[1]Carl Adam Petri and Wolfgang Reisig (2008) Petri net. Scholarpedia, 3(4):6477

[2]"OWL 2 Web Ontology Language Document Overview". W3C. 2009-10-27.

[3]An Introduction to Factor Graphs by Hans-Andrea Loeliger, IEEE Signal Processing Magazine,January 2004, pp. 28–41.

Database

Data Collection

The database was built to meet the needs of synthetic biology and genetic circuit building by offering literature references, storage, maintenance and upload of genetic devices as well as SBML text files for network data exchange.1 Genetic circuit information is not easily extracted from papers making it difficult to begin any genetic device construction. SBiDer provides a centralized library of relevant genetic circuit references in the form of pubmed id’s so that synthetic biologists can quickly and easily access experimentally backed information. Outsourcing the literature search to SBiDer gives synthetic biologists an interface with reliably relevant information. As well as providing literature references, SBiDer extracts the necessary details of each genetic device so that synthetic biologists can build devices using essential genetic building blocks. As a means of further involving the biology community, SBML data exchange files for our genetic parts and SBiDer network are accessible by using the SBiDer API. Using SBML provides further data analysis of the SBiDer network by taking advantage of approximately 200 SBML supported softwares such as Cytoscape, Systems Biology Workbench and more (www.cytoscape.org) (http://sbw.kgi.edu/research/sbwintro.htm) (www.sbml.org).

Figure 1. Plasmids used to create a genetic circuit know as μBE (Hoon Hong, 2012).

Petri Net and Schema Design

This project integrated the Petri net model by describing interactions between species and operons. Species act as input for operon activation and output. In the context of biology, a simple genetic circuit is broken down into operons, and operons can be broken down into their components: promoters, regulating elements, and coding sequences. Combining the biology and Petri net formalism, genetic device information is captured and makes for a database schema that innately captures a network of genetic device interactions. The Petri net model inspired schema works by atomizing the necessary genetic circuit information so the parts necessary for genetic circuit design are represented. Operons and species are represented as equivalents of Petri net chemical species while the interaction of species with operons and the production of species by operons is represent by input and output transitions namely. Genetic circuit design is now formalized for the biology community under the Petri net model and stored inside the SBiDer database.

Community Based Support

A single genetic circuit has a set of responses based on the chemical or environmental factors present in a biological system. By introducing new genetic circuits into the SBiDer network the biology community benefits from new devices with new functionalities or improved functionality, and when introduced into the SBiDer network creates a multitude of new genetic device combinations. The SBiDer website includes a device upload page to encourage the collaboration of the synthetic biology community to keep expanding the database and network for improved and up to date genetic devices. The database also maintains updated SBML forms of all genetic device parts and the SBiDer network so the biology community can continue working on the most up to date information.

Technologies

The database was implemented using SQLite, a widely used database management system, and the SQLite3 Python module. SQLite is developed and maintained by multiple companies including Oracle and Adobe (www.sqlite.com). With it’s widespread use and fast queries, SQLite is a powerful tool to use for collaboration with the biological community. Python’s success within the biology community, it’s ease of use and compatibility with major operating systems also makes for an effective combination with SQLite and computational work in biology (“Programming for Biologists”, 2014). Using these two technologies, a Python based API was create that encourages the synthetic biology community to participate in the database expansion and operation as well as SBML file access.

Database Schema in Depth

The plasmid table stores the name and pubmed literature references allowing users to easily find out more about their given device. The plasmid is then broken down into operons where multiple operons can be mapped to the same plasmid. In order to capture this relationship, a table of plasmid id and operon id allows for multiple operons relating to a single plasmid. A direction column in the plasmid-operon relationship table also allows users to specify the directionality of the operon inside of the given plasmid as multiple operon sequences can can be found with presence in either strands of the DNA. Operons are also mapped exclusively to their plasmid, meaning that identical operons from different plasmids require insertion into the database with a mapping to the corresponding “parent” plasmid. The operon table itself contains information on the operon sbol and sbml files representations.

The input transition table explains the type of logic that is being used for operon activation as either AND or OR logic. AND logic transitions are transitions which must take in all corresponding inputs species for the activation of the operon. OR logic transitions are transitions which require a single input species or multiple input species but do not require all. More complex relations are made between the input species and the input transitions table.

Input transition and species interactions are captured using a separate relationship table. Using this separate input transition species relationship table allows for more flexibility when describing the boolean logic at a given input transition. The same species may interact as a repressor or as activator depending on what input transition it encounters. Therefore, the table attribute repression gives a true or false statement under the repression column describing the role of that species in a given operon activation. All complex logic types, NAND; NOR; XOR; and XNOR, are captured solely by using the atomic logic types, AND; OR; and NOT. So a NAND input transition would be represented as an AND inside the input transition table with repressor molecules as the interacting species and TRUE values in the repression columns of these species.

Operons also have associated output transitions which describe the events and species produced after operon activation. These transitions unlike input transitions have very simple logic, namely IF. The reason for this is, that once an operon is activated then the production of the protein will inevitably occur. Just like input transitions and species, output transitions and species also have a relationship table with multiple species mapping to a single output transition when applicable. This illustrates the idea that operons are a sequence of genes that when activated, transcribe mRNA and protein species.




Figure 2. An entity relational diagram displaying the sbider database. Entities and their corresponding attributes are like-colored. Relationships represented by the grey diamonds may also store attributes tied to entities to illustrate that certain information is the result of a relationship.

References

[1]Hucka, M., 25 Oct. 2002., "The Systems Biology Markup Language (SBML): A Medium for Representation and Exchange of Biochemical Network Models". Oxford Journals, www.http://bioinformatics.oxfordjournals.org (October 15, 2014)

[2]"Cytoscape Core Features." What Is Cytoscape?, www.cytoscape.org (October 15, 2014)

[3]"Download SBW File A Bug Browse Forum." http://sbw.kgi.edu/research/sbwintro.htm (October 15, 2014)

[4]"SBML Software Showcase." 10 June 2014., The System Biology Markup Language, www.sbml.org (October 15, 2014)

[5]Baez, John C. "Petri Nets." www.johncarlosbaez.wordpress.com (October 15, 2014)

[6]"SQLite Home Page." SQLite Home Page. www.sqlite.org (October 15, 2014)

[7]"Why Python?" www.programmingforbiologists.org (15 October 2014)

[8]"NDEx About." www.ndexbio.org (October 15, 2014)

[9]Hoon Hong, Seok. 03 Jan. 2012, "Synthetic Quorum-sensing Circuit to Control Consortial Biogilm Formation and Dispersal in a Microfluidic Device." Nature.com. Nature Publishing Group, (October 17, 2014)

Modeling

  1. What is this program?
  2. Synthetic biology has been developed for a decade and is being expected to evolve the academic society because of its standardization and convenience.1, 2 This novel technology has already yielded bunches of applications on the field of pharmacy1, 2 and therapeutics3 while also showing its potential on energy industry.5 However, a standardized model is eagerly required to be established because it will facilitate the exploration of synthetic biology. Therefore, we developed the software – AutoModel, which integrates dozens of models of simple devices and stipulates the I/O interface then provides user standardized modeling path with high degree of freedom.

    This python script is being expected to help noviciate to construct their model and decide how much experimental data should be collected. In the coming section, we will demonstrate how this software works and how it help wet-lab experiments. Meanwhile, some instance are listed for illumination.

  3. How does it work?
  4. During the synthetic biological procedure, substantial biochemical reaction happens in vivo or in vitro. Generally speaking, those gene expression molecular level events could be represented by mass equation:4

    k1[A]aq + k2[B]aq k3[A B]aq

    where k1, k2 and k3 are reaction coefficients. One of best mathematical representation for this reaction formula is differential equation. The differential equation is a mathematical equation about some continuously varying quantities and their rates of change in time that relates some function of one or more variables with its derivatives. Therefore, differential model is widely employed on the chemical modeling. However, constructing a precise model for any specific reaction is always complex because dozens of material and formulae has to be considered.

    But now, a novel technique has been developed to help you figure out this problem. Inspired by the primary idea of synthetic biology, the AutoModel separates existed simple device as individual parts and integrates them into the device file which is a plug-in for main program. There are hundreds of mathematical expressions in the device file that represents those relationship between input(reactant) and output(resultant). Customers could add, adapt or ampu- tate any functions, parameters and terms to optimally fitting their experimental data and theoretical model. Besides, preseted device functions could be called for predict your experimental result.

  5. How to use it?

  6. There are 3 kinds of documents included in the program package.

    3.1 Device

    All of differential equations are stored in the ’DEVICE.py’ file which is a list of import subfunction of python. Those subfunction are defined in the same format that allowing user to recall them by the same method. The format is

    def device’s name (od , input_1, input_2, output, dt = 0.1) :
         parameters
         input1 = input_1 + (differential equation of input_1)* od* d t
         input2 = input_2 + (differential equation of input_2)* od* d t
         output1 = output + (differential equation of output)* od* d t
         return (input1 , input2, output)
    where od is optical density and dt is integral time step.


    3.2 Network Document

    The ’network.txt’ is the file of network which describes the connection among simulated model. This text file is a table. The first column is device’s name which corresponds to the name in ’DEVICE.py’. The next 2 columns are input_1 and its initial value. It could be either inducer’s name or device’s name from first column. The 4th and 5th represent input_2. Last 2 columns are output’s name and its value. If the word in any row is output, it means that the program will return the output value of this device.

    3.3 Main Program

    The main code for simulation is written in ’demo python.py’ which contains calculating function for growing curve, integrating function for ODE and a plotting part. Here we use an instance for illustration how to use this software.
    First, there is a default growing curve provided by demo. The initial cell density(OD) is 0.2 while growing rate is
    0.3 per 6 mins. You can change the maximum value of density by change the parameter k1 in the function grow. Second, function ’Calculate’ is an integral function that calculating the output value while reading the network from network file from top to bottom and then loop this process until the end of simulation time. In this case, the demo network is invested by David L. Shis and Matthew R. Bennett.6

    Final, ’Calculate’ will return a matrix of time scalar, optical density and output to our plotting script.

  7. Demonstration
  8. It is reasonable to assume that our host is incubated in ideal environment that the curve of optical density (OD) perfectly matches the ’S’ shape model. In other word, the growing curve is integrated by logistic model



    where kgrow is producing rate of cells and kmax is maximun value of density of cells, i.e. 1=(ODmax).

    The AND gate with two devices achieved by Shis, David L., and Matthew R. Bennett6 has been modeled for two separated parts that each one could be recalled as an individual function to represent the corresponding device. The ODEs for these 2 devices are demonstrated below.

    When pTara is induced by arabinose, this circuit produces LacI then stimulates its downstream device pET28. As result, the output of pTara is consumable inducer for circuit pET28. Therefore, we have 2 equations for pTara, which are



    and



    where OD is optical density and x1 is the input of device pTara, i.e., arabinose. Meanwhile, y is the output of device p- Tara and input of device pTara. The first equation represents the consumption of inducer that will give us an exponatial decay of the value of its input. The second equation is modeled for the output of pTara. The positive term corresponds to the producing rate of LacI whereas the negative term is of consuming rate.




    As principle employed on modeling pTara, we also provide a differential equation for pET28 which is



    where z is the output of device pET28, i.e., the value of GFP. Again the positive term is the producing rate of GFP but the negative term is of decomposition rate.

    Finally, we link these 2 device to finalize the model of AND gate. The plot of the simulation result shows that our model approximately fittings the experimental data.
    In this work, we successfully construct a new approaching of modeling for synthetic biology by applying the idea of this field which is about modulization. The modulized modeling functions which are using as bio-brick provide a novel method to academic society that allows circuit consturctors predict their experimental data. Furthermore, it gives us chance to develop the virtual laboratory so that dramatically reduces the laborious work in lab.

References

[1]Ellis, Tom, Xiao Wang, and James J. Collins. ”Diversity-based, model-guided construction of synthetic gene networks with predicted functions.” Nature biotechnology 27.5 (2009): 465-471.

[2]Andrianantoandro, Ernesto, et al. ”Synthetic biology: new engineering rules for an emerging discipline.” Molecular systems biology 2.1 (2006).

[3]Lu, Timothy K., and James J. Collins. ”Engineered bacteriophage targeting gene networks as adjuvants for an- tibiotic therapy.” Proceedings of the National Academy of Sciences 106.12 (2009): 4629-4634.

[4]Kaznessis, Yiannis N. ”Models for synthetic biology.” BMC systems biology 1.1 (2007): 47.

[5]Lee, Sung Kuk, et al. ”Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels.” Current opinion in biotechnology 19.6 (2008): 556-563.

[6]Shis, David L., and Matthew R. Bennett. ”Library of synthetic transcriptional AND gates built with split T7 RNA polymerase mutants.” Proceedings of the National Academy of Sciences 110.13 (2013): 5028-5033.