Team:TU Darmstadt/Results/Modeling/Open Software
From 2014.igem.org
(Created page with "<div id="contentWrap" class="container_24"> <div id="breadcrumbs" class="grid_24"> <p>Sie sind hier: <a href="index.php?id=23" >wiki</a> › <a href="i...")
Newer edit →
Revision as of 16:35, 17 October 2014
According to our Open Hardware approach, we would like to contribute an automated version of a general sequence and structure file analysis. Furthermore it could be used as a corporate design data visualization tool during iGEM Projects.
</article></header></header>Our preferred programming language is R, due to its user friendly interface. Changing code is easy and intuitive even for beginners. We implemented the following automated functions, which are free to use or to modify.
</article><header>Sequence Analysis</header> | Structure Analysis | Plotting | ||
---|---|---|---|---|
|
|
|
Sequence Analysis
Bioinformatics relies essentially on sequences and their corresponding alignment. Bad sequence alignment will worse results received from calculation of Shannon Entropy and Mutual Information.
After using the Basic Local Alignment Search Tool (BLAST) you will have to align all your sequences by using a distribution of Clustal Omega for instance. When using any Linux system you can use this function after installing the needed software package.1
'MSA_File' will enable pre-aligning of your sequences by using the tcltk interface. After finished calculation you should rework output for an optimal solution.
The next function 'Analyse_Start' is an automated version of sequence analysis. Per default it will calculate Shannon Entropy, two sets of mutual information (SUMI & ORMI of the BioPhysConnectoR package) and a mutual information based contact map. General information like consensus sequence and potential conservative sites will also be computed and plotted automatically. Modifying your scope by using other default options like MI-Treshold for contact map objects and choosing nullmod counter for different calculation of mutual information can be chosen at start. Another option of your choice would be the change of used amino acid alphabet instead of the common set. Therefore you could gain knowledge about the distributed amino acids on a specific position and relevant chemical properties.
Structure Analysis
Not only distribution of amino acid at a specific position inside an alignment is important but also knowledge about the three-dimensional structure and their implication on the function is crucial.
As written in our theory section, we used a normal mode analysis based on the bio3d package developed by The Grant Lab. Using 'igem_NMA' we can validate motion of protein by using different force fields described in the corresponding R documentation. These will be automatically compared and relative residual cross correlation matrix will be plotted indicating a positive or negative correlation. Atomic fluctuations and deformation energy will also be quantified and saved as a pdb-file. Using the provided trajectory analysis calculation will enable calculation of RMSD and RMSF. Another interesting option would be computation of distance calculation between two different chains, ligand or chain and the absolute distance between all atoms inside a pdb as a distance matrix. General structural information like Torsion/Dihedral analysis can also be plotted easily.
Plotting
Different data need different plots.
Therefore we are willing to provide standardized plotting functions in R.2 Although though of an corporate design, these can be modified by the user easily by adding new layers onto an existing graph. Although you do minor changes like another main title or using other fonts the output will be the same.
Most data visualization will be in a two dimensional space but can be achieved - in R - with different input classes like 'data.frame','vector' and 'matrix', although latter must be converted into class 'data.frame' before plot initialization.3 'Save2D_Vec' and 'Save2D_DF' need different input information as written in each function name. Both displayed dot plots connected via respective coloured line. Auxilliary it will create a corresponding bar or density plot, due to its input information. Width of bar plot can be calculated during runtime or taken from command line. Other plots describe a three dimensional space as shown in HeatMap and Volcano Plot. The last two function create plots best used for short fragments, due to automatic highlightning of data as text inside plot. Setting ticks manually inside all plots is preferred, while using short sequences because of possible overplotting.
- <a href="http://www.clustal.org/omega/" target="_blank">http://www.clustal.org/omega/</a>
- <a href="http://www.r-project.org/" target="_blank">http://www.r-project.org/</a>
- <a href="http://docs.ggplot2.org/current/" target="_blank">http://docs.ggplot2.org/current/</a>
Downloadable Content
- <a href="fileadmin/files/iGEM_Analysis.R" target="_blank">iGEM Analysis.R</a>
- <a href="fileadmin/files/iGEM_Calc.R" target="_blank">iGEM Calc.R</a>
- <a href="fileadmin/files/iGEM_Clustalo.R" target="_blank">iGEM Clustalo.R</a>
- <a href="fileadmin/files/iGEM_Plot.R" target="_blank">iGEM Plot.R</a>
- <a href="fileadmin/files/iGEM_ReducingAlphabet.R" target="_blank">iGEM ReducingAlphabet.R</a>
- <a href="fileadmin/files/iGEM_StatFunc.R" target="_blank">iGEM StatFunc.R</a>
Reference
- Hoffgaard, F., Weil, P., & Hamacher, K. (2010). BioPhysConnectoR: Connecting sequence information and biophysical models. BMC Bioinformatics, 11, 199. doi:10.1186/1471-2105-11-199.
- Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(539), 539. Nature Publishing Group. doi:10.1038/msb.2011.75
- Shanon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379-423.
- Bio3D: An R package for the comparative analysis of protein structures. Grant, Rodrigues, ElSawy, McCammon, Caves, (2006) Bioinformatics 22, 2695-2696.org/bio3d/index.php
- Timischl, Werner, Biostatistik, Eine Einführung für Biologen und Mediziner, Springer, 3. Auflage 2012
- <a href="http://bio.math-inf.uni-greifswald.de/viscose/html/alphabets.html" target="_blank">bio.math-inf.uni-greifswald.de/viscose/html/alphabets.html</a>