Team:Cambridge-JIC/Informatics
From 2014.igem.org
Line 333: | Line 333: | ||
<td>0.174419</td> | <td>0.174419</td> | ||
</tr> | </tr> | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
+ | </body> | ||
</html> | </html> |
Revision as of 12:00, 24 July 2014
Home | Team | Official Team Profile | TimeLine | Project | Outreach | Marchantia | Informatics | Parts | Modeling | Notebook | Protocols | Safety | Dry Work | Technology | Attributions |
Informatics
To introduce a novel chassis and make full advantage of its properties, we want to computationally analyse its genome in order to:- optimise codon usage in our registry parts and to facilitate future synthetic biology work on Marchantia;
- identify and characterise promoters, in particular looking for strong, inducible, tissue-specific or early development stage promoters.
Codon usage optimisation
Our start point was the Marchantia genome and the mRNA transcriptome predicted with the Geneious software (http://www.geneious.com/). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be realistic. Half of these were only 100 aa long. We set the threshold for candidate genes amongst these at 300 aa, obtaining the expected normal distribution of lengths.
Using the BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi), we compared the proteins coded by these candidate genes with the proteins present in Arabidopsis, given on Araport (www.araport.org). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.
codon | amino acid | per thousand | frequency |
---|---|---|---|
aaa | K | 15.5903 | 0.386139 |
aag | K | 24.7846 | 0.613861 |
aac | N | 12.1702 | 0.485816 |
aat | N | 12.8809 | 0.514184 |
aga | R | 21.4977 | 0.204651 |
agg | R | 21.7198 | 0.206765 |
agc | S | 22.2528 | 0.211839 |
agt | S | 11.6816 | 0.111205 |
aca | T | 16.2121 | 0.249317 |
acg | T | 17.4114 | 0.26776 |
acc | T | 12.4811 | 0.19194 |
act | T | 10.5268 | 0.161885 |
ata | I | 8.83894 | 0.253827 |
atg | M | 20.7426 | 1 |
atc | I | 13.858 | 0.304094 |
att | I | 12.1258 | 0.284969 |
gaa | E | 16.4342 | 0.4474 |
gag | E | 20.2985 | 0.5526 |
gac | D | 13.4139 | 0.521589 |
gat | D | 12.3035 | 0.478411 |
gga | G | 15.4126 | 0.282343 |
ggg | G | 13.325 | 0.244101 |
ggc | G | 15.768 | 0.288853 |
ggt | G | 10.0826 | 0.184703 |
gca | A | 18.5218 | 0.287388 |
gcg | A | 17.8111 | 0.276361 |
gcc | A | 13.4139 | 0.208132 |
gct | A | 14.702 | 0.228119 |
gta | V | 8.17269 | 0.173749 |
gtg | V | 15.457 | 0.328612 |
gtc | V | 13.3695 | 0.28423 |
gtt | V | 10.0382 | 0.213409 |
caa | Q | 16.4786 | 0.39135 |
cag | Q | 25.6285 | 0.60865 |
cac | H | 13.7248 | 0.463268 |
cat | H | 15.9012 | 0.536732 |
cga | R | 17.7223 | 0.16871 |
cgg | R | 15.7235 | 0.149683 |
cgc | R | 16.0789 | 0.153066 |
cgt | R | 12.3035 | 0.117125 |
cca | P | 21.5422 | 0.319921 |
ccg | P | 18.9216 | 0.281003 |
ccc | P | 10.5712 | 0.156992 |
cct | P | 16.301 | 0.242084 |
cta | L | 6.88461 | 0.0693202 |
ctg | L | 24.4292 | 0.245975 |
ctc | L | 20.565 | 0.207066 |
ctt | L | 17.3226 | 0.174419 |
taa | S | 17.3226 | 0.174419 |