Latest revision as of 01:23, 18 October 2014

Cambridge iGEM 2014

Codon Optimisation

A codon table for Marchantia polymorpha

Method

Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the M. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin[1].

We used ran blastx on our data [2] to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for M. polymorpha.

Results and Discussion

Figure 1: Our codon table for *m. polymorpha*

The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a distribution of lengths that seemed reasonable.

Figure 2: A histogram of the lengths of predicted genes in *m. polymorpha*

In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between M.polymorpha and A. thaliana [3][4].

The Codon Usage table we obtained for M. polymorpha is not strikingly similar to that of A. thaliana, as expected from the 400 million years of evolutionary divergence between them [5].

However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.

References

1. Qiagen, CLC bio Transcript Discovery ®, http://www.clcbio.com/clc-plugin/transcript-discovery/#description back to top

2. NCBI Blast ®, http://blast.ncbi.nlm.nih.gov/Blast.cgi back to top

3. Turmel M, Otis C and Lemieux C. 2003. The Mitochondrial Genome of Chara vulgaris: Insights into the Mitochondrial DNA Architecture of the Last Common Ancestor of Green Algae and Land Plants. The Plant Cell 15(8), pp. 1888-1903. back to top

4. Cantarel BL, Morrison HG, Pearson W. 2006. Exploring the Relationship between Sequence Similarity and Accurate Phylogenetic Trees. Molecular Biology and Evolution 23(11), pp. 2090-2100. back to top

5. Wellman CH, Osterloff PL, Mohiuddin U. 2003. Fragments of the earliest land plants. Nature 425, pp. 282-285. back to top

@@ Line 9: / Line 9: @@
                      <div class="intro-message">
                          <h1>Codon Optimisation</h1>
-                         <font color="black" style="BACKGROUND-COLOR: #E6E6E6">A parts table for <Em>Marchantia Polymorpha</Em>
+                         <font color="black" style="BACKGROUND-COLOR: #E6E6E6">A codon table for <Em>Marchantia polymorpha</Em>
 </font>
                      </div>
@@ Line 25: / Line 25: @@
 	            <div class="row">
 	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
-	                    <hr class="section-heading-spacer">
 	                    <div class="clearfix"></div>
 	                    <h2 class="section-heading">Method</h2>
 	                    <div>
 <p>
-Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain.  Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote1">[1]</a>.
+Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the <em>M. polymorpha</em> Cam strain.  Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote1">[1]</a>.
+</p>
-We used ran blastx on our data2 to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis3. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for <i>m.polymorpha</i>.
+<p>
+We used ran blastx on our data <a href="#Footnote2">[2]</a> to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in <em>Arabidopsis</em>. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for <em>M. polymorpha</em>.
 </p>
                              </div>
@@ Line 47: / Line 48: @@
 	            <div class="row">
 	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
-	                    <hr class="section-heading-spacer">
-	                    <h2 class="section-heading">Results and Discussion</h2>
 	                    <div class="clearfix"></div>
+	                    <h2 class="section-heading">Results and Discussion</h2>
 	                    <div>
@@ Line 57: / Line 58: @@
 </figure>
 <br>
-<p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable.
+<p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a distribution of lengths that seemed reasonable.
-[INSERT FIGURE plot of the distribution of lengths]
+<figure>
+<img src="https://static.igem.org/mediawiki/2014/9/93/Cambridge-JIC_Gene_Lengths.png" width = "550px">
-In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>m.polymorpha</i> and <i>a. thaliana</i> <a href="#Footnote2">[2]</a><a href="#Footnote3">[3]</a>.
+<figcaption>Figure 2: A histogram of the lengths of predicted genes in <i>m. polymorpha</i></figcaption>
+</figure>
-Some of the predicted ORF sequences showed isolated, local matches rather than a series of matches along the sequence length. As we would have expected evolutionarily conserved motifs to be distributed along the length, this indicated that some of our predicted ORFs were unreliable. We decided to retain these sequences in producing our final table because…? We decided to discard these motifs?
+<br>
+</p>
-The Codon Usage table we obtained for <i> m. polymorpha</i> is not strikingly similar to that of <i>a. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="#Footnote4">[4]</a>.
+<p>
+In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>M.polymorpha</i> and <i>A. thaliana</i> <a href="#Footnote3">[3]</a><a href="#Footnote4">[4]</a>.
+</p>
+<p>
+The Codon Usage table we obtained for <i> M. polymorpha</i> is not strikingly similar to that of <i>A. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="#Footnote5">[5]</a>.
+</p>
+<p>
 However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.
 </p>
@@ Line 80: / Line 86: @@
 	            <div class="row">
 	                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
-	                    <hr class="section-heading-spacer">
 	                    <div class="clearfix"></div>
 	                    <div>
-<h4>I'm also a title</h4>
+<h4>References</h4>
-<p>
+<p id="Footnote1">1.	Qiagen, CLC bio Transcript Discovery ®, <a href="http://www.clcbio.com/clc-plugin/transcript-discovery/#description">http://www.clcbio.com/clc-plugin/transcript-discovery/#description</a> <a href="#">back to top</a></p>
-Third part here
+<p id="Footnote2">2.	NCBI Blast ®, http://blast.ncbi.nlm.nih.gov/Blast.cgi <a href="#">back to top</a></p>
-</p>
+<p id="Footnote3">3.	Turmel M, Otis C and Lemieux C. 2003. <i>The Mitochondrial Genome of Chara vulgaris: Insights into the Mitochondrial DNA Architecture of the Last Common Ancestor of Green Algae and Land Plants.</i> The Plant Cell 15(8), pp. 1888-1903. <a href="#">back to top</a></p>
+<p id="Footnote4">4.	Cantarel BL, Morrison HG, Pearson W. 2006. Exploring the Relationship between Sequence Similarity and Accurate Phylogenetic Trees. Molecular Biology and Evolution 23(11), pp. 2090-2100. <a href="#">back to top</a></p>
+<p id="Footnote5">5.	Wellman CH, Osterloff PL, Mohiuddin U. 2003. <i>Fragments of the earliest land plants.</i> Nature 425, pp. 282-285. <a href="#">back to top</a></p>
                              </div>
 	                </div>

Team:Cambridge-JIC/Marchantia/Codon

From 2014.igem.org

Latest revision as of 01:23, 18 October 2014

Codon Optimisation

Method

Results and Discussion

References