Revision as of 16:48, 17 October 2014

Cambridge iGEM 2014

Codon Optimisation

A parts table for Marchantia Polymorpha

Overview

Method

Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin[1]. We used ran blastx on our data2 to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis3. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for m.polymorpha.

Results and Discussion

Figure 1: Our codon table for *m. polymorpha*

The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable. [INSERT FIGURE plot of the distribution of lengths] In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between m.polymorpha and a. thaliana [2][3]. Some of the predicted ORF sequences showed isolated, local matches rather than a series of matches along the sequence length. As we would have expected evolutionarily conserved motifs to be distributed along the length, this indicated that some of our predicted ORFs were unreliable. We decided to retain these sequences in producing our final table because…? We decided to discard these motifs? The Codon Usage table we obtained for m. polymorpha is not strikingly similar to that of a. thaliana, as expected from the 400 million years of evolutionary divergence between them [4]. However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.

I'm also a title

Third part here

@@ Line 2: / Line 2: @@
 <html>
-<head>
+     <div class="intro-header-proj">
-<style>
-table.reference,th.reference,td.reference {
-border:1px solid black;
-border-collapse:collapse;
-}
-</style>
-</head>
-<body>
+        <div class="container">
-<div align="center"><a href="https://2014.igem.org/wiki/index.php?title=Team:Cambridge-JIC/Informatics&action=edit">Edit this page</a></div>
+            <div class="row">
-<h2>Informatics</h2>
+                <div class="col-lg-9">
+                    <div class="intro-message">
+                        <h1>Codon Optimisation</h1>
+                        <font color="black" style="BACKGROUND-COLOR: #E6E6E6">A parts table for <Em>Marchantia Polymorpha</Em>
+</font>
+                    </div>
+                </div>
+            </div>
-To introduce and facilitate future use of the novel chassis <Em>Marchantia polymorpha</Em>, we computationally analysed its genome to:
+        </div>
-<ul>
+        <!-- /.container -->
-<li>find out the most efficient codon usage in order to optimise our Marchantia specific registry parts and facilitate all future synthetic biology work on Marchantia;</li>
-<li>submit a small library of Marchantia promoters to the iGEM registry, in particular looking for those which are strong, inducible, tissue-specific or expressed in an early development stage. </li>
-</ul>
-<h3 id="Codon-optimisation">Codon usage optimisation </h3>
+    </div>
+    <!-- /.intro-header -->
+	    <div class="content-section-a" id="Overview">
+	        <div class="container">
+	            <div class="row">
+	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
+	                    <hr class="section-heading-spacer">
+	                    <div class="clearfix"></div>
+	                    <h2 class="section-heading">Overview</h2>
+	                    <div>
+<h4>Method</h4>
 <p>
-Our start point was the Marchantia genome and the mRNA transcriptome predicted with Geneious software (<a href = "http://www.geneious.com/">http://www.geneious.com/</a>). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be a realistic number of genes expressed by this small liverwort. Half of these were only 100 amino acids long. We set the threshold for candidate genes among these at 300 amino acids, obtaining the expected normal distribution of lengths.
+Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain.  Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote1">[1]</a>.
+We used ran blastx on our data2 to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis3. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for <i>m.polymorpha</i>.
 </p>
+                            </div>
+	                </div>
+	            </div>
+	        </div>
+	        <!-- /.container -->
+	    </div>
+	    <!-- /.content-section-a -->
-<p>
-Using BLAST software (<a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi">http://blast.ncbi.nlm.nih.gov/Blast.cgi</a>), we compared the proteins coded by these candidate genes to the proteins present in <Em>Arabidopsis thaliana</Em>, given on Araport (<a href="www.araport.org">www.araport.org</a>). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.
+	    <div class="content-section-b">
+	        <div class="container">
+	            <div class="row">
+	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
+	                    <hr class="section-heading-spacer">
+	                    <div class="clearfix"></div>
+	                    <div>
+<h4>Results and Discussion</h4>
+<figure>
+<img src="https://static.igem.org/mediawiki/2014/a/a9/Cambridge-JIC_Codon_table.png" width = "550px">
+<figcaption>Figure 1: Our codon table for <i>m. polymorpha</i></figcaption>
+</figure>
+<p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable.
+[INSERT FIGURE plot of the distribution of lengths]
+In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>m.polymorpha</i> and <i>a. thaliana</i> <a href="#Footnote2">[2]</a><a href="#Footnote3">[3]</a>.
+Some of the predicted ORF sequences showed isolated, local matches rather than a series of matches along the sequence length. As we would have expected evolutionarily conserved motifs to be distributed along the length, this indicated that some of our predicted ORFs were unreliable. We decided to retain these sequences in producing our final table because…? We decided to discard these motifs?
+The Codon Usage table we obtained for <i> m. polymorpha</i> is not strikingly similar to that of <i>a. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="#Footnote4">[4]</a>.
+However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.
 </p>
+                </div>
+	                </div>
+	            </div>
+	        </div>
+	        <!-- /.container -->
+	    </div>
+	    <!-- /.content-section-b -->
+	    <div class="content-section-a">
+	        <div class="container">
+	            <div class="row">
+	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
+	                    <hr class="section-heading-spacer">
+	                    <div class="clearfix"></div>
+	                    <div>
+<h4>I'm also a title</h4>
 <p>
-The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.
+Third part here
 </p>
+                            </div>
-<img src="https://static.igem.org/mediawiki/2014/a/a9/Cambridge-JIC_Codon_table.png" width = 550> </img>
+	                </div>
+	            </div>
-<p>A .xls version of this table with relevant filters can be found <a href="https://static.igem.org/mediawiki/2014/4/42/Marchantia_codon_table_analysis.xls">here</a>.</p>
+	        </div>
+	        <!-- /.container -->
+	    </div>
 </html>

Team:Cambridge-JIC/Marchantia/Codon

From 2014.igem.org