Team:HUST-Innovators

From 2014.igem.org

(Difference between revisions)

@@ Line 918: / Line 918: @@
 <br/><span>- iGEM could make every team sign a safety form when applying, in order for all teams to follow a same set of safety rules.</span><br/>
-<h3>SB</h3>
+<h3>Total Innovation for Synthetic Biology</h3>
 <h3>Communication</h3>

Revision as of 17:03, 17 October 2014

Home
Background
Project
Modeling
Achievements
Team
HumanPractice

Seq.evolution for Synthe.Bio.
3rd Generation Sequencing

Our Project

New Model
DHA Production

Software
Parts Submitted

Intro.

What we did

What is SMRT
Why use SMRT

Home

While the start of Synthetic Biology is the sequenced data from databases, if there are some new species whose genetic information is unknown, how can we continue the synthesis? Therefore, we decided to expand the definition and add modules of Synthetic Biology...more

Evolution of Synthetic Biology

Overture（IGS and Bioinformatics）

In 1977 Sanger and colleagues introduced the "dideoxy" chain-termination method for sequencing DNA molecules, also known as the "Sanger method".This was a major breakthrough and allowed long stretches of DNA to be rapidly and accurately sequenced. It earned him his second Nobel prize in Chemistry in 1980, which he shared with Walter Gilbert and Paul Berg.The new method was used by Sanger and colleagues to sequence human mitochondrial DNA (16,569 base pairs) and bacteriophage λ (48,502 base pairs).The dideoxy method was eventually used to sequence the entire human genome.

As the accumulation of sequence data, databases formed, which announced the birth of the bioinformatics.

In short, the bioinformatics is the harbinger of the Synthetic-Biology’s emergence.

Sonata（NGS and Synthetic Biology）

In short, the First GS has some fatal drawbacks such as high cost and low throughput. This is why it was replaced by NGS, such as 454, Solexa, Hiseq, etc. The NGS has distinct effects on cost-reduction and time-saving with high accuracy. For instance, it took only one week to finish the human genome project by NGS, while the time using by First GS is 3 years.

As more and more species were sequenced, the size of genomic data is enlarged. As is known, the sequence of species is the major prerequisite of the SB. So the conclusion is drawn that the emergence of the NGS indicated the naissance of SB.

cadenza (III GS and ???)

That is why we start our project this year.

Docosahexaenoic Acid

Docosahexaenoic acid (DHA) is an omega-3 fatty acid that is a primary structural component of the human brain, cerebral cortex, skin, sperm, testicles and retina. It can be synthesized from alpha-linolenic acid or obtained directly from maternal milk or fish oil. DHA's structure is a carboxylic acid with a 22-carbon chain and six cis double bonds; with the first double bond located at the third carbon from the omega end. Its trivial name is cervonic acid, its systematic name is all-cis-docosa-4,7,10,13,16,19-hexa-enoic acid, and its shorthand name is 22:6(n-3) in the nomenclature of fatty acids.

Cold-water oceanic fish oils are rich in DHA. Most of the DHA in fish and multi-cellular organisms with access to cold-water oceanic foods originates from photosynthetic and heterotrophic microalgae, and becomes increasingly concentrated in organisms the further they are up the food chain. DHA is also commercially manufactured from microalgae: Crypthecodinium cohnii and another of the genus Schizochytrium. DHA manufactured using microalgae is vegetarian.

Thraustochytrids

Thraustochytrids is rich in oceanic environment. And they can produce large amount of DHA. However our strain T-roseum ATCC28210 was still not sequenced yet. It means our project was blocked if we used the traditional Synthetic Biology Methods. To achieve our goal, we need to do something before the Traditional Process.

Tips

This wiki will be your team’s first interaction with the rest of the world, so here are a few tips to help you get started:

State your accomplishments! Tell people what you have achieved from the start.
Be clear about what you are doing and what you plan to do.
You have a global audience! Consider the different backgrounds that your users come from.
Make sure information is easy to find; nothing should be more than 3 clicks away.
Avoid using very small fonts and low contrast colors; information should be easy to read.
Start documenting your project as early as possible; don’t leave anything to the last minute before the Wiki Freeze. For a complete list of deadlines visit the iGEM 2013 calendar
Have lots of fun!

What is SMRT

Single molecule real time sequencing (also known as SMRT) is a parallelized single molecule DNA sequencing by synthesis technology developed by Pacific Biosciences. Single molecule real time sequencing utilizes the zero-mode waveguide (ZMW), developed in the laboratories of Harold G. Craighead and Watt W. Webb[1] at Cornell University. A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA (also known as a base) being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.

The DNA sequencing is done on a chip that contains many ZMWs. Inside each ZMW, a single active DNA polymerase with a single molecule of single stranded DNA template is immobilized to the bottom through which light can penetrate and create a visualization chamber that allows monitoring of the activity of the DNA polymerase at a single molecule level. The signal from a phospho-linked nucleotide incorporated by the DNA polymerase is detected as the DNA synthesis proceeds which results in the DNA sequencing in real time.

Phospholinked nucleotide

For each of the nucleotide bases, there are four corresponding fluorescent dye molecules that enable the detector to identify the base being incorporated by the DNA polymerase as it performs the DNA synthesis. The fluorescent dye molecule is attached to the phosphate chain of the nucleotide. When the nucleotide is incorporated by the DNA polymerase, the fluorescent dye is cleaved off with the phosphate chain as a part of a natural DNA synthesis process during which a phosphodiester bond is created to elongate the DNA chain. The cleaved fluorescent dye molecule then diffuses out of the detection volume so that the fluorescent signal is no longer detected.

Zero-mode waveguide

The zero-mode waveguide (ZMW) is a nanophotonic confinement structure that consists of a circular hole in an aluminum cladding film deposited on a clear silica substrate.[3] The ZMW holes are ~70 nm in diameter and ~100 nm in depth. Due to the behavior of light when it travels through a small aperture, the optical field decays exponentially inside the chamber.[4] The observation volume within an illuminated ZMW is ~20 zeptoliters (20 X 10−21 liters). Within this volume, the activity of DNA polymerase incorporating a single nucleotide can be readily detected.

Why uses SMRT(The merits compared to the NGS)

Relative Terminologies

Contigs

A sequence contig is a contiguous, overlapping sequence read resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing strategies. This meaning of contig is consistent with the original definition by Rodger Staden (1979). The bottom-up DNA sequencing strategy involves shearing genomic DNA into many small fragments ("bottom"), sequencing these fragments, reassembling them back into contigs and eventually the entire genome ("up"). Because current technology allows for the direct sequencing of only relatively short DNA fragments (300–1000 nucleotides), genomic DNA must be fragmented into small pieces prior to sequencing. In bottom-up sequencing projects, amplified DNA is sheared randomly into fragments appropriately sized for sequencing. The subsequent sequence reads, which are the data that contains the sequence of each fragment, are assembled into contigs, which are finally connected by sequencing the gaps between them resulting in a sequenced genome. The ability to assemble contigs depends on the overlap of reads. Because shearing is random and performed on multiple copies of DNA, each portion of the genome should be represented multiple times in different fragment frames. In other words, the sequences of the fragments (and thus the reads) should overlap. After sequencing, the overlapping reads are assembled into contigs by assembly software.

Scaffold

Scaffolds consist of overlapping contigs separated by gaps of known length. The new constraints placed on the orientation of the contigs allows for the placement of highly repeated sequences in the genome. If one end read has a repetitive sequence, as long as its mate pair is located within a contig, its placement is known. The remaining gaps between the contigs in the scaffolds can then be sequenced by a variety of methods, including PCR amplification followed by sequencing (for smaller gaps) and BAC cloning methods followed by sequencing for larger gaps.

Advancement from NGS

Extremely Long Reads

The NGS sequencing Technology requires prohibitive storage space and operation time for the sake of relatively short digital crumbs( In fact, the average length of NGS reads is far longer than the Sanger, the first GS reads). Clearly, the shortage of NGS is their assembly, like puzzle games. The shorter the reads are, the more barriers we will meet in the concatenation process. Unfortunately, this problem is fundamental data property, even if we have more proper modeling and more efficient algorithms. Until now we have found no chance to compensate this demerit.

Comparatively speaking, the SMRT sequencing operating on PacBio RS II

1.Incredibly long read length: the average read length is between 5，000 to 8，000 bases per read( the average length of NGS read ranges from 150 to 400 bp). The maximum read length can reach 20，000 bp.
For instance, the USDA had intended to sequence the microbes in goats. Compared to the least 18 contigs produced by the NGS, which means the project is uncompleted, the SMRT can get the ultimate contig—a successful sequencing.

2.High accuracy：In short, the accuracy of sequencing can reach 99.999%.

3.GC Problem：For the NGS, areas with high GC concentration mean low coverage. Only insufficient info can be dug out for assembly. This is why many gaps between DNA sequences emerge in the result.
In SMRT, the coverage do not fluctuate as the GC content varies. So the Problem can be avoided.

Our Project

Sequencing

First, we decided to sequence the unknown Thraustochytrium roseum. As we read the thesis on Velvet, SSpace, GRASS and so on, we finally made a determination to use the struture of the GRASS to do the scaffolding process after finishing getting contigs. As the data is sequenced from the 3rd Generation Sequencing Method, we had changed some lines of the code in order to fix the characteristics of the data.
GRASS has some progresses: Sort, Bundle, Extract, RemoveAmbiguous, Erode IsolateContigsThink. Those are not needed while we are using the SMRT sequencing data. So we deleted code about them.
Think about the NGS Assembly, nearly all the softwares use pair-end reads to link the contigs through the scaffolding process. Knowing that the reads from the Pacbio machines are extremely,after self-overlapping, we let the Pacbio sequence replace the contigs in GRASS to do the scaffolding.
Here is a part of result:

Then, with the help of Nextomics, we finished the annotation and found that the scaffold shown above is the required one.

DHA engineering yeast

Abstract

Unsaturated fatty acids
Unsaturated fatty acids have one or more double bonds between carbon atoms. (Pairs of carbon atoms connected by double bonds can be saturated by adding hydrogen atoms to them, converting the double bonds to single bonds. Therefore, the double bonds are called unsaturated.)
The two carbon atoms in the chain that are bound next to either side of the double bond can occur in a cis or trans configuration.
Traditional origin

A.Deep sea fish oil

This is the major source of PUFAs. The disadvantages include complex purification processes and limited resources.

B.Marine microalgae

This is the original source of the PUFAs. The disadvantages are low productivity and unstability output.
The synthesis path of PUFAs

In our project, we use EPA and ARA as substrates to produce DHA and DPA.

Technical route

Thraustochytrids and Isochrysis galbana as materials is used for producing DHA and DPA in yeast in order to provide research materials and scientific data.
Results:

New Modle

The INNOVATION of the Synthetic-Biology’s Modules

Now the Biobricks become the transition of 2 submodules.

How to find the biobricks

Traditional Method:

Its origin is the Database. However, like our strain T-roseum ATCC28210 which was not sequenced yet, we could not get the biobricks.
Novel Definition:

After adding a new part of modules, we successfully finished our project.

DHA Production

definition

unit

Methanol concentration

nARA

ARA concentration

nEPA

EPA concentration

Concentration of transcribed mRNAs of PAOX1

Concentration of transcribed mRNAs of PAOX5'

Max transcription rates of PAOX1

nM/S

Max transcription rates of PAOX5

nM/S

Max transcription rates of PAOX5'

nM/S

δ1

Degradation rate of EPA

1/H

δ2

Degradation rate of DHA

1/H

δ3

Degradation rate of DPA

1/H

Translation rate

M/S

Degradation rate of mRNA

M/S

Software

The software is decided for those who is not familiar to the Bioinformatics and Computer Science. What guests need to do is to print the location of the required files. Then a final sequencing result will emerge. The software works on Unix system only.
Downloading: click here
PS: Perl and Comprehensive Perl Archive Network are needed.

Part Submitted

Team Introduction

The iGEM HUST-INNOVATORS 2014 is an energetic family composed of 10 undergraduate students and 2 advisors. Our members come from a variety of departments, majoring in non-bio-concerned subjects like Energy & Power Engineering, Computer Science. And other 5 of us come from biology department.The detailed work distribution is as follows:

Leaders: He Yu, Zhang Yue
Advisors: Chen Gang, Wang Depeng ,Gong Yangming
Team Members: Tang Yu, Dong Xiaolei, Wang Yiqiao, Hou Yuhan

Attribution

Sequencing Procedure Group:
Proposal: Zhang Yue
Program designer: Tangyu, Dong Xiaolei
Documentation: Zhang Yue, Hou Yuhan

Experiments Group:
Proposal: He Yu
Program designer: Liang Qihua, Zhang Zihe, Li Xiaotong, Yang Kairan

Modeling:
Main Designer: Hou Yuhan
Modeling :Hou Yuhan, Wang Yiqiao

Wiki:
Main designer: Mo Bufei
Content: Tang Yu, Dong Xiaolei, Wang Yiqiao, Hou Yuhan, Liang Qihua, Zhang Zihe, Li Xiaotong, Yang Kairan

Acknowledgment

We are truly grateful to the following people and organizations for their kind support with funding, materials, facilities, and professional advice:

Prof. Gong Yangming, the researcher of the OILCROP REASEARCH INSTITUTE, for providing the strain of the Thraustochytrids and the experiments platform for us.

Mr. Wang Depeng, the head of Nextomics, for helping us to change some part of code of some assembly tools and sequence the Thraustochytrids we need to use.

Mr. Liang Fan, a proposal of Nextomics, for helping us construct our tools on computers.

Mr. Liu Zhenhua, a proposal of Nextomics, for teaching us knowledge about sequencing history and tools.

Prof. Chen Gang, the Chef of the School of Energy and Power Engineering, HUST, for instructing our work and supporting our team fund and transportation fee.

Mr. Mo Bufei, for helping us construct our wiki before wiki freeze. Team WHU, for helping us construct the biobrick.

Special Thanks

Sincerely,we express our gratefulness to all the members in Team WHU-China. As our team is not from the College of Life Science and Technology in HUST University, we found some troubles while finishing our experiments. When we could not find anyone for help immediately, however, the Team WHU-China gave us a hand. They provided us a lab and gave us instructions during experiments, and finally helped us construct our biobrick（BBa_K1551000) successfully.

What we did

Safety

Would any of your project ideas raise safety issues in terms of: researcher safety, public safety, or environmental safety?

-All researchers have been trained with lab safety protocols and conducted the experiment strictly following the safety protocols. No one was injured or infected while working on our project. The microorganisms used in our project are T-roseum ATCC28210, which are not toxic or pathogenic. No lab waste was released to the environment before sterilization,too.

Do any of the new BioBrick parts (or devices) that you made this year raise any safety issues?

-No. None of the BioBrick parts we made pose any safety problems we can foresee. None of the parts we made produce toxic or pathogenic protein, and all parts are found in common bacterial species.

Is there a local biosafety group, committee, or review board at your institution?

- The OILCROP RESEARCH INSTITUTE, CHINESE ACADEMY OF AGRICULTURAL SCIENCES makes sure that every project is evaluated of its bio-safety issues before initiation. The head of Institute and our advisors approved that our project would not raise safety problems, etc.

Do you have any other ideas how to deal with safety issues that could be useful for future iGEM competitions? How could parts, devices and systems be made even safer through biosafety engineering?

- iGEM could make every team sign a safety form when applying, in order for all teams to follow a same set of safety rules.

Total Innovation for Synthetic Biology

Communication

Cooperation with Nextomics

While doing our project, we had an cooperation with Nextomics. The company is a good at dealing with the SMRT Data, which helped us construct the biobricks.

“What are the benefits using SMRT Data while the construction of biobricks?”

“Faster, cheaper and able to be used by individuals.”

Retrieved from "http://2014.igem.org/Team:HUST-Innovators"