Team:TU Darmstadt/Results/Modeling/Open Software

From 2014.igem.org

Revision as of 17:07, 17 October 2014 by BastianW (Talk | contribs)

Home

Open Software

According to our Open Hardware approach, we would like to contribute an automated version of a general sequence and structure file analysis. Furthermore it could be used as a corporate design data visualization tool during iGEM Projects.

Our preferred programming language is R, due to its user friendly interface. Changing code is easy and intuitive even for beginners. We implemented the following automated functions, which are free to use or to modify.

Sequence Analysis

      

Structure Analysis

      

Plotting

 

  • Clustal Omega
  • Consensus  Seqeuence
  • Conservative Side Detection
  • Shannon Entropy
  • Mutual Information

 

 

  •  Normal mode Analysis
  • Comparison of models (NMA)
  • Trajectory Analysis
  • RMSD
  • RMSF
  • Binding estimation
  • Torison/Dihedral analysis
  • Distance matrix calculation

 

 

  • ggplot
  • HeatMap
  • Volcano Plot
  • Wireframe
  • 'Fancy' 3D Scatter Plot

 

 

Sequence Analysis

Bioinformatics relies essentially on sequences and their corresponding alignment. Bad sequence alignment will worse results received from calculation of Shannon Entropy and Mutual Information. 
After using the Basic Local Alignment Search Tool (BLAST) you will have to align all your sequences by using a distribution of Clustal Omega for instance. When using any Linux system you can use this function after installing the needed software package.1 
'MSA_File' will enable pre-aligning of your sequences by using the tcltk interface. After finished calculation you should rework output for an optimal solution.  
The next function 'Analyse_Start' is an automated version of sequence analysis. Per default it will calculate Shannon Entropy, two sets of mutual information (SUMI & ORMI of the BioPhysConnectoR package) and a mutual information based contact map. General information like consensus sequence and potential conservative sites will also be computed and plotted automatically. Modifying your scope by using  other default options like MI-Treshold for contact map objects and choosing nullmod counter for different calculation of mutual information can be chosen at start. Another option of your choice would be the change of used amino acid alphabet instead of the common set. Therefore you could gain knowledge about the distributed amino acids on a specific position and relevant chemical properties.

Structure Analysis

Not only distribution of amino acid at a specific position inside an alignment is important but also knowledge about the three-dimensional structure and their implication on the function is crucial. 
As written in our theory section, we used a normal mode analysis based on the bio3d package developed by The Grant Lab. Using 'igem_NMA' we can validate motion of protein by using different force fields described in the corresponding R documentation. These will be automatically compared and relative residual cross correlation matrix will be plotted indicating a positive or negative correlation. Atomic fluctuations and deformation energy will also be quantified and saved as a pdb-file. Using the provided trajectory analysis calculation will enable calculation of RMSD and RMSF. Another interesting option would be computation of distance calculation between two different chains, ligand or chain and the absolute distance between all atoms inside a pdb as a distance matrix. General structural information like Torsion/Dihedral analysis can also be plotted easily.

Plotting

Reference

  1. Hoffgaard, F., Weil, P., & Hamacher, K. (2010). BioPhysConnectoR: Connecting sequence information and biophysical models. BMC Bioinformatics, 11, 199. doi:10.1186/1471-2105-11-199. 
  2. Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(539), 539. Nature Publishing Group. doi:10.1038/msb.2011.75 
  3. Shanon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379-423. 
  4. Bio3D: An R package for the comparative analysis of protein structures. Grant, Rodrigues, ElSawy, McCammon, Caves, (2006) Bioinformatics 22, 2695-2696.org/bio3d/index.php
  5. Timischl, Werner, Biostatistik, Eine Einführung für Biologen und Mediziner, Springer, 3. Auflage 2012
  6. bio.math-inf.uni-greifswald.de/viscose/html/alphabets.html