Team:Cambridge-JIC/Informatics

From 2014.igem.org

(Difference between revisions)
Line 20: Line 20:
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.  
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.  
</p>
</p>
 +
 +
<th> codon </th>
 +
<th> amino acid </th>
 +
<th> per thousand </th>
 +
<th> frequency </th>
 +
<tr>
 +
<td>aaa</td>
 +
<td>K</td>
 +
<td>15.5903</td>
 +
<td>0.386139</td>
 +
</tr>
 +
<tr>
 +
<td>aag</td>
 +
<td>K</td>
 +
<td>24.7846</td>
 +
<td>0.613861</td>
 +
</tr>
 +
<tr>
 +
<td>aac</td>
 +
<td>N</td>
 +
<td>12.1702</td>
 +
<td>0.485816</td>
 +
</tr>
 +
<tr>
 +
<td>aat</td>
 +
<td>N</td>
 +
<td>12.8809</td>
 +
<td>0.514184</td>
 +
</tr>
 +
<tr>
 +
<td>aga</td>
 +
<td>R</td>
 +
<td>21.4977</td>
 +
<td>0.204651</td>
 +
</tr>
 +
<tr>
 +
<td>agg</td>
 +
<td>R</td>
 +
<td>21.7198</td>
 +
<td>0.206765</td>
 +
</tr>
 +
<tr>
 +
<td>agc</td>
 +
<td>S</td>
 +
<td>22.2528</td>
 +
<td>0.211839</td>
 +
</tr>
 +
<tr>
 +
<td>agt</td>
 +
<td>S</td>
 +
<td>11.6816</td>
 +
<td>0.111205</td>
 +
</tr>
 +
<tr>
 +
<td>aca</td>
 +
<td>T</td>
 +
<td>16.2121</td>
 +
<td>0.249317</td>
 +
</tr>
 +
<tr>
 +
<td>acg</td>
 +
<td>T</td>
 +
<td>17.4114</td>
 +
<td>0.26776</td>
 +
</tr>
 +
<tr>
 +
<td>acc</td>
 +
<td>T</td>
 +
<td>12.4811</td>
 +
<td>0.19194</td>
 +
</tr>
 +
<tr>
 +
<td>act</td>
 +
<td>T</td>
 +
<td>10.5268</td>
 +
<td>0.161885</td>
 +
</tr>
 +
<tr>
 +
<td>ata</td>
 +
<td>I</td>
 +
<td>8.83894</td>
 +
<td>0.253827</td>
 +
</tr>
 +
<tr>
 +
<td>atg</td>
 +
<td>M</td>
 +
<td>20.7426</td>
 +
<td>1</td>
 +
</tr>
 +
<tr>
 +
<td>atc</td>
 +
<td>I</td>
 +
<td>13.858</td>
 +
<td>0.304094</td>
 +
</tr>
 +
<tr>
 +
<td>att</td>
 +
<td>I</td>
 +
<td>12.1258</td>
 +
<td>0.284969</td>
 +
</tr>
 +
<tr>
 +
<td>gaa</td>
 +
<td>E</td>
 +
<td>16.4342</td>
 +
<td>0.4474</td>
 +
</tr>
 +
<tr>
 +
<td>gag</td>
 +
<td>E</td>
 +
<td>20.2985</td>
 +
<td>0.5526</td>
 +
</tr>
 +
<tr>
 +
<td>gac</td>
 +
<td>D</td>
 +
<td>13.4139</td>
 +
<td>0.521589</td>
 +
</tr>
 +
<tr>
 +
<td>gat</td>
 +
<td>D</td>
 +
<td>12.3035</td>
 +
<td>0.478411</td>
 +
</tr>
 +
<tr>
 +
<td>gga</td>
 +
<td>G</td>
 +
<td>15.4126</td>
 +
<td>0.282343</td>
 +
</tr>
 +
<tr>
 +
<td>ggg</td>
 +
<td>G</td>
 +
<td>13.325</td>
 +
<td>0.244101</td>
 +
</tr>
 +
<tr>
 +
<td>ggc</td>
 +
<td>G</td>
 +
<td>15.768</td>
 +
<td>0.288853</td>
 +
</tr>
 +
<tr>
 +
<td>ggt</td>
 +
<td>G</td>
 +
<td>10.0826</td>
 +
<td>0.184703</td>
 +
</tr>
 +
<tr>
 +
<td>gca</td>
 +
<td>A</td>
 +
<td>18.5218</td>
 +
<td>0.287388</td>
 +
</tr>
 +
<tr>
 +
<td>gcg</td>
 +
<td>A</td>
 +
<td>17.8111</td>
 +
<td>0.276361</td>
 +
</tr>
 +
<tr>
 +
<td>gcc</td>
 +
<td>A</td>
 +
<td>13.4139</td>
 +
<td>0.208132</td>
 +
</tr>
 +
<tr>
 +
<td>gct</td>
 +
<td>A</td>
 +
<td>14.702</td>
 +
<td>0.228119</td>
 +
</tr>
 +
<tr>
 +
<td>gta</td>
 +
<td>V</td>
 +
<td>8.17269</td>
 +
<td>0.173749</td>
 +
</tr>
 +
<tr>
 +
<td>gtg</td>
 +
<td>V</td>
 +
<td>15.457</td>
 +
<td>0.328612</td>
 +
</tr>
 +
<tr>
 +
<td>gtc</td>
 +
<td>V</td>
 +
<td>13.3695</td>
 +
<td>0.28423</td>
 +
</tr>
 +
<tr>
 +
<td>gtt</td>
 +
<td>V</td>
 +
<td>10.0382</td>
 +
<td>0.213409</td>
 +
</tr>
 +
<tr>
 +
<td>caa</td>
 +
<td>Q</td>
 +
<td>16.4786</td>
 +
<td>0.39135</td>
 +
</tr>
 +
<tr>
 +
<td>cag</td>
 +
<td>Q</td>
 +
<td>25.6285</td>
 +
<td>0.60865</td>
 +
</tr>
 +
<tr>
 +
<td>cac</td>
 +
<td>H</td>
 +
<td>13.7248</td>
 +
<td>0.463268</td>
 +
</tr>
 +
<tr>
 +
<td>cat</td>
 +
<td>H</td>
 +
<td>15.9012</td>
 +
<td>0.536732</td>
 +
</tr>
 +
<tr>
 +
<td>cga</td>
 +
<td>R</td>
 +
<td>17.7223</td>
 +
<td>0.16871</td>
 +
</tr>
 +
<tr>
 +
<td>cgg</td>
 +
<td>R</td>
 +
<td>15.7235</td>
 +
<td>0.149683</td>
 +
</tr>
 +
<tr>
 +
<td>cgc</td>
 +
<td>R</td>
 +
<td>16.0789</td>
 +
<td>0.153066</td>
 +
</tr>
 +
<tr>
 +
<td>cgt</td>
 +
<td>R</td>
 +
<td>12.3035</td>
 +
<td>0.117125</td>
 +
</tr>
 +
<tr>
 +
<td>cca</td>
 +
<td>P</td>
 +
<td>21.5422</td>
 +
<td>0.319921</td>
 +
</tr>
 +
<tr>
 +
<td>ccg</td>
 +
<td>P</td>
 +
<td>18.9216</td>
 +
<td>0.281003</td>
 +
</tr>
 +
<tr>
 +
<td>ccc</td>
 +
<td>P</td>
 +
<td>10.5712</td>
 +
<td>0.156992</td>
 +
</tr>
 +
<tr>
 +
<td>cct</td>
 +
<td>P</td>
 +
<td>16.301</td>
 +
<td>0.242084</td>
 +
</tr>
 +
<tr>
 +
<td>cta</td>
 +
<td>L</td>
 +
<td>6.88461</td>
 +
<td>0.0693202</td>
 +
</tr>
 +
<tr>
 +
<td>ctg</td>
 +
<td>L</td>
 +
<td>24.4292</td>
 +
<td>0.245975</td>
 +
</tr>
 +
<tr>
 +
<td>ctc</td>
 +
<td>L</td>
 +
<td>20.565</td>
 +
<td>0.207066</td>
 +
</tr>
 +
<tr>
 +
<td>ctt</td>
 +
<td>L</td>
 +
<td>17.3226</td>
 +
<td>0.174419</td>
 +
</tr>
 +
<tr>
 +
<td>taa</td>
 +
<td>S</td>
 +
<td>17.3226</td>
 +
<td>0.174419</td>
 +
</tr>
 +

Revision as of 11:06, 24 July 2014

Home Team Official Team Profile TimeLine Project Outreach Marchantia Informatics Parts Modeling Notebook Protocols Safety Dry Work Technology Attributions

Informatics

To introduce a novel chassis and make full advantage of its properties, we want to computationally analyse its genome in order to:

  • optimise codon usage in our registry parts and to facilitate future synthetic biology work on Marchantia;
  • identify and characterise promoters, in particular looking for strong, inducible, tissue-specific or early development stage promoters.


Codon usage optimisation

Our start point was the Marchantia genome and the mRNA transcriptome predicted with the Geneious software (http://www.geneious.com/). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be realistic. Half of these were only 100 aa long. We set the threshold for candidate genes amongst these at 300 aa, obtaining the expected normal distribution of lengths.

Using the BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi), we compared the proteins coded by these candidate genes with the proteins present in Arabidopsis, given on Araport (www.araport.org). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.

The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.

codon amino acid per thousand frequency aaa K 15.5903 0.386139 aag K 24.7846 0.613861 aac N 12.1702 0.485816 aat N 12.8809 0.514184 aga R 21.4977 0.204651 agg R 21.7198 0.206765 agc S 22.2528 0.211839 agt S 11.6816 0.111205 aca T 16.2121 0.249317 acg T 17.4114 0.26776 acc T 12.4811 0.19194 act T 10.5268 0.161885 ata I 8.83894 0.253827 atg M 20.7426 1 atc I 13.858 0.304094 att I 12.1258 0.284969 gaa E 16.4342 0.4474 gag E 20.2985 0.5526 gac D 13.4139 0.521589 gat D 12.3035 0.478411 gga G 15.4126 0.282343 ggg G 13.325 0.244101 ggc G 15.768 0.288853 ggt G 10.0826 0.184703 gca A 18.5218 0.287388 gcg A 17.8111 0.276361 gcc A 13.4139 0.208132 gct A 14.702 0.228119 gta V 8.17269 0.173749 gtg V 15.457 0.328612 gtc V 13.3695 0.28423 gtt V 10.0382 0.213409 caa Q 16.4786 0.39135 cag Q 25.6285 0.60865 cac H 13.7248 0.463268 cat H 15.9012 0.536732 cga R 17.7223 0.16871 cgg R 15.7235 0.149683 cgc R 16.0789 0.153066 cgt R 12.3035 0.117125 cca P 21.5422 0.319921 ccg P 18.9216 0.281003 ccc P 10.5712 0.156992 cct P 16.301 0.242084 cta L 6.88461 0.0693202 ctg L 24.4292 0.245975 ctc L 20.565 0.207066 ctt L 17.3226 0.174419 taa S 17.3226 0.174419