Team:Cambridge-JIC/Marchantia/Codon
From 2014.igem.org
Informatics
To introduce and facilitate future use of the novel chassis Marchantia polymorpha, we computationally analysed its genome to:- find out the most efficient codon usage in order to optimise our Marchantia specific registry parts and facilitate all future synthetic biology work on Marchantia;
- submit a small library of Marchantia promoters to the iGEM registry, in particular looking for those which are strong, inducible, tissue-specific or expressed in an early development stage.
Codon usage optimisation
Our start point was the Marchantia genome and the mRNA transcriptome predicted with Geneious software (http://www.geneious.com/). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be a realistic number of genes expressed by this small liverwort. Half of these were only 100 amino acids long. We set the threshold for candidate genes among these at 300 amino acids, obtaining the expected normal distribution of lengths.
Using BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi), we compared the proteins coded by these candidate genes to the proteins present in Arabidopsis thaliana, given on Araport (www.araport.org). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.
A .xls version of this table with relevant filters can be found here.