Team:Cambridge-JIC/Marchantia/Promoter

From 2014.igem.org

(Difference between revisions)
 
(7 intermediate revisions not shown)
Line 45: Line 45:
<p>
<p>
-
We reviewed research papers to create a shortlist of inducible plant promoters for which we might find homologues in the <em>marchantia</em> genome. We narrowed our search to promoters regulated under limiting supply of nutrients nitrates, sulphates, phosphates (although light variation, circadian rhythm, metabolism and development related inducers were also considered initially). For the majority of our analyses, we selected <em>Arabidopsis thaliana</em> as a model organism from which to identify target genes given the quality of genetic information that is available for the plant. However, we also used genes from the following organisms were data was available: <em>B. nigra, L. esculentum, B. napus, C. reinhardtii, G. max, N. plumbaginifolia, P. patens.</em>
+
We reviewed research papers to create a shortlist of inducible plant promoters for which we might find homologues in the <em>marchantia</em> genome. We narrowed our search to promoters regulated under limiting supply of essential nutrients including nitrates, sulphates, phosphates (although light variation, circadian rhythm, metabolism and development related inducers were also considered initially). For the majority of our analyses, we selected <em>Arabidopsis thaliana</em> as a model organism from which to identify target genes because of the quality of genetic information that is available for this plant. However, we also used genes from the following organisms where data was available: <em>B. nigra, L. esculentum, B. napus, C. reinhardtii, G. max, N. plumbaginifolia, P. patens.</em>
<br>  
<br>  
-
We identified a shortlist of 27 genes that might be regulated under limiting supply of the essential nutrients nitrates, sulphates, and phosphates.  
+
We identified a shortlist of 27 genes that might be regulated under limiting supply of nitrates, sulphates, and phosphates.  
</p>
</p>
Line 56: Line 56:
TAIR - http://www.arabidopsis.org/ <a href="#Footnote4">[4]</a> <br>
TAIR - http://www.arabidopsis.org/ <a href="#Footnote4">[4]</a> <br>
UniProt - http://www.uniprot.org/ <a href="#Footnote5">[5]</a><br>
UniProt - http://www.uniprot.org/ <a href="#Footnote5">[5]</a><br>
 +
</p>
 +
<p>
 +
(For details on the exact gene functions, sequence sources and reference papers, please <a href="Further_Promoter_Search_Information.xls">click here</a>)
</p>
</p>
<br>
<br>
<p>
<p>
-
We used Geneious™ to run tblastn<a href="#Footnote6">[6]</a> and query the protein coding sequences against the nucleotide sequences of the <em>m. polymorpha</em> scaffolds. Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the m. polymorpha Cam strain.  Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote7">[7]</a>.
+
We used Geneious™ to run tblastn<a href="#Footnote6">[6]</a> and query the protein coding sequences against the nucleotide sequences of the <em>m. polymorpha</em> scaffolds. Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the <em>M. polymorpha</em> Cam strain.  Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin<a href="#Footnote7">[7]</a>.
 +
<br>
<br>
<br>
<figure>
<figure>
-
<img src="https://static.igem.org/mediawiki/2014/5/5d/Cambridge_JIC_Blast_example.png" width = "550px">
+
<img src="https://static.igem.org/mediawiki/2014/5/5d/Cambridge_JIC_Blast_example.png" width = "650px">
<figcaption>Figure 1: Example of a blast hit, matching a nitrate transporter protein sequence to a <em>Marchantia</em> gene</figcaption>
<figcaption>Figure 1: Example of a blast hit, matching a nitrate transporter protein sequence to a <em>Marchantia</em> gene</figcaption>
</figure>
</figure>
Line 86: Line 90:
        <div class="container">
        <div class="container">
            <div class="row">
            <div class="row">
-
                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
+
                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">                  
-
                    <hr class="section-heading-spacer">
+
                    <div class="clearfix"></div>
                    <div class="clearfix"></div>
                    <h2 class="section-heading">Results and Discussion</h2>
                    <h2 class="section-heading">Results and Discussion</h2>
Line 98: Line 101:
<u>Notes:</u> <br>
<u>Notes:</u> <br>
• In Figures 2 and 3 the predicted gene numbers are grouped according to the related essential nutrient: genes 603 – 11107 inclusive relate to nitrates; genes 3555 and 3556 relate to sulphates and 170-11169 inclusive relate to phosphates.<br>
• In Figures 2 and 3 the predicted gene numbers are grouped according to the related essential nutrient: genes 603 – 11107 inclusive relate to nitrates; genes 3555 and 3556 relate to sulphates and 170-11169 inclusive relate to phosphates.<br>
-
• The percentage overlaps in Figure 3 measure how much of a gene sequence length was covered by the hit sequence length, and how much of the hit sequence length was covered by an mRNA sequence length in a 1 -to-1 mapping between nucleotides.
+
• The percentage overlaps in Figure 3 measure how much of a gene sequence length was covered by the hit sequence length, and how much of the hit sequence length was covered by an mRNA sequence length in a 1-to-1 mapping between nucleotides.
-
 
+
</p>
</p>
 +
<br>
 +
<figure>
<figure>
-
<img src="https://static.igem.org/mediawiki/2014/a/a9/Cambridge-JIC_Codon_table.png" width = "550px">
+
<img src="https://static.igem.org/mediawiki/2014/a/ad/Cambridge-JIC_Total_number_blast_hits.png" width = "850px">
-
<figcaption>Figure 1: Our codon table for <i>m. polymorpha</i></figcaption>
+
<figcaption>Figure 2: A Comparison of the total number of blast® hit matches of proteins of interest with predicted genes in <em>M. polymorpha</em> </figcaption>
</figure>
</figure>
<br>
<br>
-
<p>The number of Open Reading Frames (ORFs) resulting from our initial predictions totalled 99 000, which seemed too large to be realistic, and half of these were only 100 amino acids (100 aa) long. By filtering the dataset using a threshold of 300 aa for candidate genes we obtained a distribution of lengths that seemed reasonable.
+
 
<figure>
<figure>
-
<img src="https://static.igem.org/mediawiki/2014/a/a9/Cambridge-JIC Gene Lengths.png" width = "550px">
+
<img src="https://static.igem.org/mediawiki/2014/5/5c/Cambridge-JIC_Congruence_Of_blast_hits.png" width = "800px">
-
<figcaption>Figure 2: A histogram of the lengths of predicted genes <i>m. polymorpha</i></figcaption>
+
<figcaption>Figure 3: A comparison of the congruence of blast® hit matches of proteins of interest with predicted genes and with mRNA transcripts in <em>M. polymorpha</em> </figcaption>
</figure>
</figure>
<br>
<br>
 +
 +
<p>
 +
The mean total number of hits for a predicted gene was 5 [see Figure 2]. Of the genes for which the hit value is at this frequency or above, all except three relate to nitrate-inducible responses; the remainder relate to potassium starvation responses. It can be considered that the queried genes with more matches to <em>M. polymorpha</em> along the predicted gene sequence lengths are likelier to be better homologues. However we think that a mapping of the blast® hits to the predicted genes would be needed to confirm this. This is because a spread of hits along the gene length would give stronger evidence than many hits to the same location, which we have not ruled out at this time.
</p>
</p>
 +
<p>
<p>
-
In the blastx output, the longest complete sequence match was 40%. This is small as expected given the phylogenetic distance between </i>m.polymorpha</i> and <i>a. thaliana</i> <a href="#Footnote3">[3]</a><a href="#Footnote4">[4]</a>.  
+
Comparing Figures 2 and 3 , predicted genes 4354 and 11107 are the best candidates for further work as they both have relatively high hit frequency and congruence with mRNA and the queried gene sequences. Gene 11107 also demonstrates a good match with the gene AtNRT2.1 (45.6% congruence) which has been shown to be “induced by very low levels of nitrate (50 μM KNO3)” (Filleur, Daniel-Vedele, 1999)<a href="#Footnote8">[8]</a>. Given this sensitivity, we suggest that this gene would be a very good candidate for experimentally verifying the associated promoter (P_Ni20, iGEM Registry Part:BBa_K1484336). Gene 4354 demonstrates a good match with the gene AtNRT2.2 (57.7% congruence), which is expressed up to 8 days earlier than AtNRT2.1 (Zhuo et al., 1999)<a href="#Footnote9">[9]</a> and so it may be quicker to experimentally verify the promoter for  this potential homologue [P_Ni17, iGEM Registry Part:BBa_K1484333].
</p>
</p>
 +
<p>
<p>
-
The Codon Usage table we obtained for <i> m. polymorpha</i> is not strikingly similar to that of <i>a. thaliana</i>, as expected from the 400 million years of evolutionary divergence between them <a href="#Footnote5">[5]</a>.
+
Although gene 11169 did not match with an mRNA transcript, it has been included in Figure 3 as an observation of interest. It might seem a very good candidate for further work given the startling degree of congruence between the hit and the gene, but we were yet to verify the reliability of the reconstructed regions of the genome scaffolds where mRNA sequence data was unavailable.  
</p>
</p>
 +
<p>
<p>
-
However, there is a similarity in the slight preference for C over other bases at the end of codons and that for G-p-C sites, as can be seen in the codon table.
+
We also observed that the majority of the better hits fell along middling regions of the predicted genes. This seems reasonable given that <em>Arabidopsis thaliana</em> emerged approximately 10 million years ago<a href="#Footnote10">[10]</a> and <em>Marchantia</em> emerged around 400 million years ago<a href="#Footnote1">[1]</a>, so there would have been sufficient time for non-essential amino acids to get knocked out or replaced by genetic drift, with the essential ones (e.g. molecule binding motifs) being conserved.  
</p>
</p>
-
                </div>
+
<br>
 +
       
 +
                            <hr class="section-heading-spacer">                  
 +
                    <div class="clearfix"></div>
 +
                    <h2 class="section-heading">Conclusion</h2>
 +
                    <div>
 +
 
 +
 
 +
<p>
 +
We have produced a shortlist of 27 potential promoter regions in the <em>Marchantia polymorpha</em> genome that may be regulated under phosphate starvation, sulphate limitation or in the presence of nitrates. The sequences for the regions can be found in the iGEM Registry, under the following catalogue numbers:<br>
 +
Part:BBa K1484319 to Part:BBa K1484325 inclusive; Part:BBa K1484327 to Part:BBa K1484334 inclusive; Part:BBa K1484336 to Part:BBa K1484347 inclusive. <br>
 +
 
 +
Our analysis mostly supports promoters related to nitrate regulation as most of the best hits were found in this group. Further experimentation is recommended in order to verify these as useable parts. We would like to highlight Part:BBa_K1484333 and Part:BBa_K1484336 as promising candidates based on the frequency of good blast hits with the corresponding predicted <em>M. polymorpha</em> genes, and the congruence of these genes to mRNA transcripts. <br>
 +
These results form part of the ‘Marchantia Starter Kit’ we have prepared to help future iGEM participants and others who wish to work with this plant chassis.
 +
</p>
 +
<br>
 +
 
 +
                        </div>
                </div>
                </div>
            </div>
            </div>
Line 129: Line 156:
    </div>
    </div>
    <!-- /.content-section-b -->
    <!-- /.content-section-b -->
 +
 +
    <div class="content-section-a">
    <div class="content-section-a">
        <div class="container">
        <div class="container">
            <div class="row">
            <div class="row">
                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
                <div class="col-lg-9 col-sm-push-0.75  col-sm-6">
-
                    <hr class="section-heading-spacer">
+
                 
                    <div class="clearfix"></div>
                    <div class="clearfix"></div>
                    <div>
                    <div>
Line 144: Line 173:
<p id="Footnote6">6. NCBI Blast ®, http://blast.ncbi.nlm.nih.gov/Blast.cgi <a href="#">back to top</a></p>
<p id="Footnote6">6. NCBI Blast ®, http://blast.ncbi.nlm.nih.gov/Blast.cgi <a href="#">back to top</a></p>
<p id="Footnote7">7. Qiagen, CLC bio Transcript Discovery ®, <a href="http://www.clcbio.com/clc-plugin/transcript-discovery/#description">http://www.clcbio.com/clc-plugin/transcript-discovery/#description</a> <a href="#">back to top</a></p>
<p id="Footnote7">7. Qiagen, CLC bio Transcript Discovery ®, <a href="http://www.clcbio.com/clc-plugin/transcript-discovery/#description">http://www.clcbio.com/clc-plugin/transcript-discovery/#description</a> <a href="#">back to top</a></p>
 +
<p id="Footnote8">8. S. Filleur, F. Daniel-Vedele. 1999. <i>Expression analysis of a high-affinity nitrate transporter isolated from Arabidopsis thaliana by differential display.</i> Planta, 207, pp. 461–469.<a href="#">back to top</a></p>
 +
<p id="Footnote9">9. D. Zhuo et al. 1999. <i>Regulation of a putative high-affinity nitrate transporter (Nrt2;1At) in roots of Arabidopsis thaliana.</i> Plant J., 17, pp. 563–568.<a href="#">back to top</a></p>
 +
<p id="Footnote10">10. Koch MA, Kiefer M. 2005. <i>Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps of three diploid species--Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana.</i> Am J Bot, 92(4), pp. 761-7. <a href="#">back to top</a></p>
 +
                             </div>
                             </div>
                </div>
                </div>

Latest revision as of 01:29, 18 October 2014

Cambridge iGEM 2014


Promoter hunt

Finding new parts for Marchantia Polymorpha

One of the key aims for our project is to introduce Marchantia polymorpha to iGEM with a toolset that enables future teams to develop it further and capitalise on its benefits. So given the limited knowledge about its genetic makeup at present, we have sought to find possible inducible promoters in Marchantia that could be used for parts.

Method

To start off, we hypothesised that inducible promoters would be associated with responses to environmental cues or pressures and that, given their importance, related genetic motifs would be largely conserved in the evolution of land plants. So as Marchantia polymorpha is an early land plant[1], we thought it was likely that many homologues to its genes could still be found in later plants and that these genes could be used to find promoters.


We reviewed research papers to create a shortlist of inducible plant promoters for which we might find homologues in the marchantia genome. We narrowed our search to promoters regulated under limiting supply of essential nutrients including nitrates, sulphates, phosphates (although light variation, circadian rhythm, metabolism and development related inducers were also considered initially). For the majority of our analyses, we selected Arabidopsis thaliana as a model organism from which to identify target genes because of the quality of genetic information that is available for this plant. However, we also used genes from the following organisms where data was available: B. nigra, L. esculentum, B. napus, C. reinhardtii, G. max, N. plumbaginifolia, P. patens.
We identified a shortlist of 27 genes that might be regulated under limiting supply of nitrates, sulphates, and phosphates.

We obtained the peptide sequences for these genes from the following online databases:
Thalemine - https://apps.araport.org/thalemine/begin.do [2]
GenBank - http://www.ncbi.nlm.nih.gov/genbank [3]
TAIR - http://www.arabidopsis.org/ [4]
UniProt - http://www.uniprot.org/ [5]

(For details on the exact gene functions, sequence sources and reference papers, please click here)


We used Geneious™ to run tblastn[6] and query the protein coding sequences against the nucleotide sequences of the m. polymorpha scaffolds. Our dataset was made of large gap read mapping transcripts obtained by mRNA sequencing conducted by the Haseloff Lab on the M. polymorpha Cam strain. Open Reading Frame (ORF) and Coding Sequence (CDS) predictions were made using the CLC bio Transcript Discovery plugin[7].

Figure 1: Example of a blast hit, matching a nitrate transporter protein sequence to a Marchantia gene

We selected the most convincing hits as those with a grading above 30% and got a shortlist by ranking them based on concurrence with an existing gene prediction as this made the selections more reliable. We isolated possible promoter regions as those 2kbp upstream from the start of the purported gene.
Hits for very short regions of homology were not selected. This generally corresponded to hits shorter than 5% of the sequence length of the predicted gene, although slightly shorter hits were noted as support for the reliability of a good match.

Results and Discussion

Out of the 27 genes that we shortlisted, we were able to obtain products from a PCR off extracted M. polymorpha DNA with a success rates of 50% - 70% on successive runs. We ran out of time to satisfactorily complete this stage and debug our difficulties with Gibson Assembly, but we think that we have good evidence that these candidates warrant further investigation.

Notes:
• In Figures 2 and 3 the predicted gene numbers are grouped according to the related essential nutrient: genes 603 – 11107 inclusive relate to nitrates; genes 3555 and 3556 relate to sulphates and 170-11169 inclusive relate to phosphates.
• The percentage overlaps in Figure 3 measure how much of a gene sequence length was covered by the hit sequence length, and how much of the hit sequence length was covered by an mRNA sequence length in a 1-to-1 mapping between nucleotides.


Figure 2: A Comparison of the total number of blast® hit matches of proteins of interest with predicted genes in M. polymorpha

Figure 3: A comparison of the congruence of blast® hit matches of proteins of interest with predicted genes and with mRNA transcripts in M. polymorpha

The mean total number of hits for a predicted gene was 5 [see Figure 2]. Of the genes for which the hit value is at this frequency or above, all except three relate to nitrate-inducible responses; the remainder relate to potassium starvation responses. It can be considered that the queried genes with more matches to M. polymorpha along the predicted gene sequence lengths are likelier to be better homologues. However we think that a mapping of the blast® hits to the predicted genes would be needed to confirm this. This is because a spread of hits along the gene length would give stronger evidence than many hits to the same location, which we have not ruled out at this time.

Comparing Figures 2 and 3 , predicted genes 4354 and 11107 are the best candidates for further work as they both have relatively high hit frequency and congruence with mRNA and the queried gene sequences. Gene 11107 also demonstrates a good match with the gene AtNRT2.1 (45.6% congruence) which has been shown to be “induced by very low levels of nitrate (50 μM KNO3)” (Filleur, Daniel-Vedele, 1999)[8]. Given this sensitivity, we suggest that this gene would be a very good candidate for experimentally verifying the associated promoter (P_Ni20, iGEM Registry Part:BBa_K1484336). Gene 4354 demonstrates a good match with the gene AtNRT2.2 (57.7% congruence), which is expressed up to 8 days earlier than AtNRT2.1 (Zhuo et al., 1999)[9] and so it may be quicker to experimentally verify the promoter for this potential homologue [P_Ni17, iGEM Registry Part:BBa_K1484333].

Although gene 11169 did not match with an mRNA transcript, it has been included in Figure 3 as an observation of interest. It might seem a very good candidate for further work given the startling degree of congruence between the hit and the gene, but we were yet to verify the reliability of the reconstructed regions of the genome scaffolds where mRNA sequence data was unavailable.

We also observed that the majority of the better hits fell along middling regions of the predicted genes. This seems reasonable given that Arabidopsis thaliana emerged approximately 10 million years ago[10] and Marchantia emerged around 400 million years ago[1], so there would have been sufficient time for non-essential amino acids to get knocked out or replaced by genetic drift, with the essential ones (e.g. molecule binding motifs) being conserved.



Conclusion

We have produced a shortlist of 27 potential promoter regions in the Marchantia polymorpha genome that may be regulated under phosphate starvation, sulphate limitation or in the presence of nitrates. The sequences for the regions can be found in the iGEM Registry, under the following catalogue numbers:
Part:BBa K1484319 to Part:BBa K1484325 inclusive; Part:BBa K1484327 to Part:BBa K1484334 inclusive; Part:BBa K1484336 to Part:BBa K1484347 inclusive.
Our analysis mostly supports promoters related to nitrate regulation as most of the best hits were found in this group. Further experimentation is recommended in order to verify these as useable parts. We would like to highlight Part:BBa_K1484333 and Part:BBa_K1484336 as promising candidates based on the frequency of good blast hits with the corresponding predicted M. polymorpha genes, and the congruence of these genes to mRNA transcripts.
These results form part of the ‘Marchantia Starter Kit’ we have prepared to help future iGEM participants and others who wish to work with this plant chassis.


References

1. Wellman CH, Osterloff PL, Mohiuddin U. 2003. Fragments of the earliest land plants. Nature 425, pp. 282-285. back to top

2. Thalemine - https://apps.araport.org/thalemine/begin.do [Accessed: July – September 2014]back to top

3. GenBank - http://www.ncbi.nlm.nih.gov/genbank [Accessed: July – September 2014]back to top

4. TAIR - http://www.arabidopsis.org/ [Accessed: July – September 2014]back to top

5. UniProt - http://www.uniprot.org/ [Accessed: July – September 2014]back to top

6. NCBI Blast ®, http://blast.ncbi.nlm.nih.gov/Blast.cgi back to top

7. Qiagen, CLC bio Transcript Discovery ®, http://www.clcbio.com/clc-plugin/transcript-discovery/#description back to top

8. S. Filleur, F. Daniel-Vedele. 1999. Expression analysis of a high-affinity nitrate transporter isolated from Arabidopsis thaliana by differential display. Planta, 207, pp. 461–469.back to top

9. D. Zhuo et al. 1999. Regulation of a putative high-affinity nitrate transporter (Nrt2;1At) in roots of Arabidopsis thaliana. Plant J., 17, pp. 563–568.back to top

10. Koch MA, Kiefer M. 2005. Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps of three diploid species--Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am J Bot, 92(4), pp. 761-7. back to top