Team:Cambridge-JIC/Marchantia/Codon
From 2014.igem.org
Line 31: | Line 31: | ||
<p> | <p> | ||
Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote1">[1]</a>. | Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote1">[1]</a>. | ||
- | + | </p> | |
- | We used ran blastx on our | + | <p> |
+ | We used ran blastx on our data <a href="#Footnote2">[2]</a> to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis3. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for <i>m.polymorpha</i>. | ||
</p> | </p> | ||
</div> | </div> | ||
Line 59: | Line 60: | ||
<p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable. | <p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable. | ||
[INSERT FIGURE plot of the distribution of lengths] | [INSERT FIGURE plot of the distribution of lengths] | ||
- | + | </p> | |
- | In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>m.polymorpha</i> and <i>a. thaliana</i> <a href="# | + | <p> |
- | + | In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>m.polymorpha</i> and <i>a. thaliana</i> <a href="#Footnote3">[3]</a><a href="#Footnote3">[3]</a>. | |
- | + | </p> | |
- | + | <p> | |
- | The Codon Usage table we obtained for <i> m. polymorpha</i> is not strikingly similar to that of <i>a. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="# | + | The Codon Usage table we obtained for <i> m. polymorpha</i> is not strikingly similar to that of <i>a. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="#Footnote5">[5]</a>. |
- | + | </p> | |
+ | <p> | ||
However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table. | However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table. | ||
</p> | </p> |
Revision as of 16:52, 17 October 2014
Method
Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin[1].
We used ran blastx on our data [2] to verify the validity of the predicted ORFs and to compare the sequences with the proteins present in Arabidopsis3. A list of candidate genes was compiled from the results and this list was used to calculate a frequency table of codon usage for m.polymorpha.
Results and Discussion
The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a normal distribution of lengths, which seemed reasonable. [INSERT FIGURE plot of the distribution of lengths]
In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between m.polymorpha and a. thaliana [3][3].
The Codon Usage table we obtained for m. polymorpha is not strikingly similar to that of a. thaliana, as expected from the 400 million years of evolutionary divergence between them [5].
However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.
I'm also a title
Third part here