Team:Cambridge-JIC/Informatics

From 2014.igem.org

(Difference between revisions)
 
(36 intermediate revisions not shown)
Line 1: Line 1:
-
{{:Team:Cambridge-JIC/Templates/header}}
+
{{:Team:Cambridge-JIC/Templates/header_prototype3}}
<html>
<html>
 +
<head>
 +
<style>
 +
table.reference,th.reference,td.reference {
 +
border:1px solid black;
 +
border-collapse:collapse;
 +
}
 +
</style>
 +
</head>
 +
 +
<body>
 +
<div align="center"><a href="https://2014.igem.org/wiki/index.php?title=Team:Cambridge-JIC/Informatics&action=edit">Edit this page</a></div>
<h2>Informatics</h2>
<h2>Informatics</h2>
-
<p>To introduce a novel chassis and make full advantage of its properties, we want to computationally analyse its genome in order to:<br>
+
 
 +
To introduce and facilitate future use of the novel chassis <Em>Marchantia polymorpha</Em>, we computationally analysed its genome to:
<ul>
<ul>
-
<li>optimise codon usage in our registry parts and to facilitate future synthetic biology work on Marchantia;
+
<li>find out the most efficient codon usage in order to optimise our Marchantia specific registry parts and facilitate all future synthetic biology work on Marchantia;</li>
-
<li>identify and characterise promoters, in particular looking for strong, inducible, tissue-specific   or early development stage promoters.
+
<li>submit a small library of Marchantia promoters to the iGEM registry, in particular looking for those which are strong, inducible, tissue-specific or expressed in an early development stage. </li>
</ul>
</ul>
-
</p>
+
 
-
<br>
+
<h3 id="Codon-optimisation">Codon usage optimisation </h3>
-
<h3> Codon usage optimisation </h3>
+
 
<p>
<p>
-
Our start point was the Marchantia genome and the mRNA transcriptome predicted with the Geneious software (<a href = "http://www.geneious.com/">http://www.geneious.com/</a>). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be realistic. Half of these were only 100 aa long. We set the threshold for candidate genes amongst these at 300 aa, obtaining the expected normal distribution of lengths.
+
Our start point was the Marchantia genome and the mRNA transcriptome predicted with Geneious software (<a href = "http://www.geneious.com/">http://www.geneious.com/</a>). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be a realistic number of genes expressed by this small liverwort. Half of these were only 100 amino acids long. We set the threshold for candidate genes among these at 300 amino acids, obtaining the expected normal distribution of lengths.
</p>
</p>
 +
<p>
<p>
-
Using the BLAST software (<a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi">http://blast.ncbi.nlm.nih.gov/Blast.cgi</a>), we compared the proteins coded by these candidate genes with the proteins present in Arabidopsis, given on Araport (<a href="www.araport.org">www.araport.org</a>). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.
+
Using BLAST software (<a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi">http://blast.ncbi.nlm.nih.gov/Blast.cgi</a>), we compared the proteins coded by these candidate genes to the proteins present in <Em>Arabidopsis thaliana</Em>, given on Araport (<a href="www.araport.org">www.araport.org</a>). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.
</p>
</p>
 +
<p>
<p>
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.  
The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.  
</p>
</p>
 +
<img src="https://static.igem.org/mediawiki/2014/a/a9/Cambridge-JIC_Codon_table.png" width = 550> </img>
 +
<p>A .xls version of this table with relevant filters can be found <a href="https://static.igem.org/mediawiki/2014/4/42/Marchantia_codon_table_analysis.xls">here</a>.</p>
 +
<h2 id="Promoter">Promoter identification</h2>
 +
<p>To identify potential promoter regions in Marchantia's genome, we looked for gene sequences from other species, mainly <i>Arabidopsis thaliana</i> and <i>Physcomitrella patens</i>. We were particularly interested in genes that were:
 +
<ul>
 +
<li>Nitrate inducible</li>
 +
<li>Sulphate inducible</li>
 +
<li>Phosphate inducible</li>
 +
<li>Circadian rythm driven</li>
 +
<li>Metabolism related</li>
 +
<li>Development related</li>
 +
<li>Light responsive</li>
 +
<li>Development</li>
 +
<li>Standard templates for proteins</li>
 +
<li>Standard templates for genes</li>
 +
</ul>
 +
</p>
 +
<p>Reading recent research papers in search for these genes and looking for their sequences on GenBank and ThaleMine, we constructed a list of protein sequences to compare to the Marchantia predicted scaffolds. (For genes where only the nucleotide sequence was available, instead of the protein sequence, a C++ code was written to perform the translation). We ran a tblastn search in BLAST for the best matches between our candidate proteins and the Marchantia scaffolds. Then, for the best matches (~60% and above, with some judging by eye), we annotated the 2kb upstream of the start codon as a potential promoter region.</p>
 +
<img src="https://static.igem.org/mediawiki/2014/5/5d/Cambridge_JIC_Blast_example.png">
 +
<p>Example of a BLAST hit, matching an inducible nitrate transporter sequence to a Marchantia gene</p>
 +
<p>We identified 30 candidate promoters this way, that we are planning to screen by inserting in a construct driving the yellow fluorescent protein Venus. For each promoter, we will make a construct with and one without amplification by GAL4 and GAL4 UAS, to evaluate the promoter strength and get around any leakages due to the use of GAL4. </p>
 +
<!--Edit this page link-->
 +
<div align="center"><a href="https://2014.igem.org/wiki/index.php?title=Team:Cambridge-JIC/Informatics&action=edit">Edit this page</a>
 +
</div><br>
 +
<!--End of Edit this page link-->
</html>
</html>

Latest revision as of 14:41, 6 October 2014

Cambridge iGEM 2014


Edit this page

Informatics

To introduce and facilitate future use of the novel chassis Marchantia polymorpha, we computationally analysed its genome to:
  • find out the most efficient codon usage in order to optimise our Marchantia specific registry parts and facilitate all future synthetic biology work on Marchantia;
  • submit a small library of Marchantia promoters to the iGEM registry, in particular looking for those which are strong, inducible, tissue-specific or expressed in an early development stage.

Codon usage optimisation

Our start point was the Marchantia genome and the mRNA transcriptome predicted with Geneious software (http://www.geneious.com/). The data was given to us from Jim's lab by Bernardo. 99 000 ORFs were predicted, which seems too large to be a realistic number of genes expressed by this small liverwort. Half of these were only 100 amino acids long. We set the threshold for candidate genes among these at 300 amino acids, obtaining the expected normal distribution of lengths.

Using BLAST software (http://blast.ncbi.nlm.nih.gov/Blast.cgi), we compared the proteins coded by these candidate genes to the proteins present in Arabidopsis thaliana, given on Araport (www.araport.org). Some of the sequences showed incomplete matches, indicated that our predicted ORFs should be regarded with some vigilance. The longest sequence showed a 40% match, a small number as expected.

The relevant DNA sequences from the candidate mRNAs using the BLAST output. Then the different codons were counted in these genes, in order to obtain a codon table for Marchantia. While the table is not strikingly similar to Arabidopsis', we can note a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites.

A .xls version of this table with relevant filters can be found here.

Promoter identification

To identify potential promoter regions in Marchantia's genome, we looked for gene sequences from other species, mainly Arabidopsis thaliana and Physcomitrella patens. We were particularly interested in genes that were:

  • Nitrate inducible
  • Sulphate inducible
  • Phosphate inducible
  • Circadian rythm driven
  • Metabolism related
  • Development related
  • Light responsive
  • Development
  • Standard templates for proteins
  • Standard templates for genes

Reading recent research papers in search for these genes and looking for their sequences on GenBank and ThaleMine, we constructed a list of protein sequences to compare to the Marchantia predicted scaffolds. (For genes where only the nucleotide sequence was available, instead of the protein sequence, a C++ code was written to perform the translation). We ran a tblastn search in BLAST for the best matches between our candidate proteins and the Marchantia scaffolds. Then, for the best matches (~60% and above, with some judging by eye), we annotated the 2kb upstream of the start codon as a potential promoter region.

Example of a BLAST hit, matching an inducible nitrate transporter sequence to a Marchantia gene

We identified 30 candidate promoters this way, that we are planning to screen by inserting in a construct driving the yellow fluorescent protein Venus. For each promoter, we will make a construct with and one without amplification by GAL4 and GAL4 UAS, to evaluate the promoter strength and get around any leakages due to the use of GAL4.

Edit this page