Abstract

Peptidic linkers are widely used tools in protein modification, not only for connecting protein subdomains but also for connecting their extremities to circularize them. Traditionally, flexible linkers like Glycin-Serine peptides are used for this purpose. ### Reference needed ###. However, to keep domains of chimeric proteins in a certain distance, rigid peptides built of helical patterns are also often applied. In the following we show a novel approach to build customized rigid linkers that follow a desired shape. This is achieved by connecting peptides forming rigid helices with amino acids that produce a certain angle between those. To design these linkers structures databases normally used for protein-structure prediction are analyzed and used the other way around for finding fitting building blocks. The potential linkers were tested in a large screening in silico, showing that the patterns produce the predicted structures. Additionally they were tested in vitro for circularizing lysozyme from bacteriophage lambda as a model enzyme. Not only the modularity but also the reliability of our linkers are huge advantages compared to designing linkers classically.

Introduction

Artificially circularized proteins can gain heat stability due to the constrain of the relative positioning of the C- and the N-termini. If the ends are too far from each other, circularization requires a linker that does not change the natural conformation of the protein but restrains the relative position of the ends and thus restricts the degrees of freedom. On top, these linkers should not affect any of the protein's functions. Consequently it is important to prevent linkers from passing through the active site or from covering binding domain to other molecules for example. Therefore one needs to be able to define the shape of possible linkers. Classically, protein linkers were designed in three different manners. ###REFERENCE### The easiest way is to define the length a linker should span and then simply use a flexible glycine-serine peptide with the right amount of amino acids to match this length. Glycine is used for flexibility, as it has no sidechain and does not produce any steric hindrance, while serine is used for solubility, as it has a small polar side chain. This solubility is important, as normally the linkers should not pass through the hydrophobic core of the protein, but should be dissolved in the surrounding medium. These flexible linkers were normally used in circularization but also for connecting different proteins, when the main important aspect is that the different parts are connected. A second strategy consists in using rigid helical linkers to keep proteins or protein domains at a certain distance from each other. This is important especially for signalling proteins and fluorescent proteins . ###TODO: Reference### One major property of alpha helices is that they always fold in a well defined way with well defined angles and lengths. There are also many different helical patterns that differ in stability and solubility. One big disadvantage of this strategy is that one can only build straight linkers with helices. The third option consists in designing customly tailored linkers for each specific application. These linkers can be obtained from protein structure prediction. At first one needs to define the path the linker should take to connect two amino acids. Afterwards one designs a possible linker sequence that might fit well. Next one makes a structure prediction of the linker attached to the proteins to validate the prediction. Several different linkers, with slight changes, can be compared. This is repeated several times until the linker is taking the path it should. ###TODO: Reference, WADE paper### This method is time consuming as it is not only computation intensive, but also requires a strong knowledge on protein folding and protein structure prediction. On the other hand the benefit can be important as the interaction of the linker with the proteins surface can be taken into account and as one can nearly completely define the path taken by the linker up to the accuracy of protein structure prediction. By combining the advantages of these different approaches for linker design, we have set up a model to build rigid linkers with alpha helices following a certain path. The main achievement is the modularity of our system for building. §§§don't know yet where to put this information§§§Also for artificial protein engineering it is most important being able to define the conformation of the single helical building blocks by defining a supersecondary structure.

Background

Primary, secondary, tertiary and quaternary structure are the main levels of protein structure characterization. Primary structure designates the amino acid sequence, while the secondary structure describes the arrangement of consecutive amino acids through their two dihedral angles $\phi$ and $\psi$. The Ramachandran plot, which represents the amino acid position in the space of those two angles, shows two particular arrangement commonly found in proteins: alpha helices and beta sheets. The next level of protein organization is the tertiary structure, which describes how the protein is organized in the three spatial dimensions, whereas the quaternary structure describes how different subunits of proteins cluster. Finally, closely related to these standard structures is the supersecondary structure, that describes how secondary structure elements are connected to each other by on first sight undefined conformations. Further analysis revealed that this wide variety of supersecondary structure motifs can be clustered to certain patterns. [5]

Supersecondary structure

When the properties of supersecondary structures were first described, only very few patterns were identified, mainly due to the lack of highly resolved protein structures. At that time the structures were mainly classified by the Ramachandran plot regions ($\alpha, \beta, \gamma$ etc. ) where the amino acids could be found. [6] With growing amount of known crystal structures, the analysis of supersecondary structure became better and better leading to databases with about 300 000 classified loop structures and elaborate clustering. [7] Nowadays supersecondary structures are just the structures built when two secondary structure elements are combined by a small peptide that is not clustered into one of the secondary structures. These loop peptides range from 1 to 9 amino acids.

Defining the structure

The aim was to build reliable stable linkers out of alpha helices connected by supersecondary structure motifs that produce certain angles. To achieve that, we searched for the most reliable alpha helix patterns and angle patterns covering the whole range of angles from 0 to 180 degrees.

Helix patterns

To be done.

Angle patterns

1. 1. Figure for explanation needed###

The angle patterns for our model were obtained from the ArchDB database, which classifies loops from known proteins structures. About 17 000 non-homologous proteins from PDB database were analyzed and thus over 300 000 loop structures, i.e. regions connecting two secondary structure elements, were identified. ###Numbers to be checked###. The classification took into account not only the length of the loop, its conformation, meaning φ and ψ backbone dihedral angles of the residues in the loop, but also the distance between the attachments of the loop to the surrounding secondary structures. Furthermore the surrounding secondary structures of the loop and the geometry defined by the super-secondary structure motifs can be found in the database. To extract from ArchDB the relevant supersecondary structure motifs for our linker design, the complete database was downloaded and helix-loop-helix motifs were extracted using a self-written script in python programming language. Furthermore we only took into account loops composed of 1 and 2 amino acids, because the longer the loops, the less frequent and therefore the less reliable a single loop is, and the further the ends are from each other. The interesting information for us was the angle produced between the vectors defining the bracing alpha helices, the distance between the ends of the loop, and the type of amino acids surrounding the loop. Furthermore we analyzed the statistical significance of the conformation. ###still to be done### For each amino acid combination in the loop region, the angle distribution between the embracing alpha helices, the loop length distribution and a 2d heatmap of the embracing amino acids were automatically plotted. These distributions were then visually analyzed to identify loops of interest for linker design. We focused on loops that show a narrow angle distribution and that appear frequently in the database. The corresponding amino acids were further analyzed, by enlarging the amino acid pattern with the amino acids occurring the most next to them§§§ In practical terms this means..§§§ ###figure needed###. This allowed us to narrow down the angle distribution, and also to select loops where no preference for the surrounding amino acids could be seen anymore. Thus we can claim, that the angle distribution is not due to the surrounding structure, but because of the identified pattern itself.

In silico refinement

As some of the interesting patterns could not be found often enough to be statistically significant, we decided to make a further refinement in silico by modeling the structure of proteins with circularizing linkers. To perform this for realistic situations, we selected from the RCSB database structures of non-homologous target proteins with extremities that are separated enough to require a linker for circularization. For setting up an environment as close as possible to the application of the patterns were we designed the following workflow. First, the ###linker software### generates possible fitting linkers for various proteins. From these possible linkers, the 100 shortest were taken and the angle patterns that should produce the same angles §§§Be more clear. For producing one angle different patterns can be used. We want to know which of them is best then, so we tested them against each other. §§§§ are exchanged in the linkers to get more variety. The linkers connect the ends of the protein without setting tension on the protein, so that the protein can fold in its natural way. For further information on the generation of the linker sequences please follow to the ###link### After this the circularized proteins with the specific linkers are modelled using a software called Modeller.[8] This software is widely used for comparative structure prediction. It is well established in the scientific community and is most suitable for prediction of loop regions attached to existing structures. [9] Modeller is a program that is able to predict the 3d structure of a given sequence based on the restraints from an §§§What do you mean? Explain: Modeller needs two things, a sequence and a structure, then finds the sequence in the structure and predicts the folding. §§§§ alignment to existing structures. It is freely available for academical usage from the salilab webpages. ###link###As we just want to determine the properties of our linker patterns attached to proteins, it perfectly fitted our purpose. ???We provided it with the crystal structure of a protein of interest and a linker sequence attached to it and the software returned a model of the structure of the circularized protein through the linker.??? Most important for our purpose is that Modeller does not rely on structural databases like ArchDB database, but does an ab initio modelling of our linkers by minimizing energy functions with different methods like conjugate gradients and molecular dynamics. Each modeled structure is provided with energy values, thanks to which different models of the same structure can be compared. From Modeller we received about ###30### different models and choose the one with the best energy scores to further proceed. ###To calculate the models we at first make an alignment between the structure and the sequence and starting from this, modeller generates ???3??? initial models. On each of these models a loop refinement is made, resulting in about ???12??? models per run. For the loop refinement all parts of the sequence, that can not be found in the structure file, are further calculated.§§§should appear before§§§### For these refinement steps, one can choose different levels of optimization. We always decided for accuracy instead of velocity of the program.

Modeller was run  via the ###link to i@h### system. ###further explanation needed###. The modelling of one linker took about ???10??? hours of calculation time on average via the iGEM@home system. But this value is highly depending on the size of the protein. Each run of modeller is made twice so that calculation errors can be handeled better §§§Rewrite again no errors, manipulations, variety§§§. Then the best model is determined and is analyzed by another self-written program to analyze the behaviour of the linker patterns in the natural surroundings.

All these models for the different structures and the different linkers are then analyzed for their properties like the length of the helical patterns, the shape of the attachment structures of the linker and the angles produced by the angle patterns. ###Figure missing###. First the modeled structure and the natural structure are fitted together, to see how big the differences between those are. If the protein has been disturbed too much, the model is discarded ###still needs to be done###. Otherwise length of attachment sequences are calculated just by calculating the distance of the atoms. For the helical patterns a vector is fitted to the C$\alpha$s. For these vectors always distance between the ends, the length and the angles are calculated. Furthermore a possible crossing point is estimated. Afterwards for each helical pattern and for each angle-pattern we obtain a distribution for the different properties, so that we can refine our assumptions on the behaviour of the patterns. With the coordinates of the estimation of crossing points, on can furthermore see, whether the linker really follows a software predicted path and thus verifying the results of the linker-software.

Results

Out of this, we decided to set up a modular system for our linkers. All linkers start with two amino acids, that guarantee some flexibility to the ends of the protein and that prohibit the attached helix to continue into the protein and thus making non-helical regions helical. The next building block is one of the alpha helix forming patterns AEAAAK, AEAAAKA, AEAAAKAA, AEAAAKEAAAK, ###... with a well-defined length and shape. Then an angle pattern is attached. All the angle patterns chosen by us, have the same distance from the actual turning point. Thus one can easily exchange different angle patterns and easily calculate the distances between the different turning points, like it is used in our ###software###. To this angle pattern, another helix pattern can easily be attached again. ###figure needed###. All our linkers end with the two exteins because of circularization or the sortase scar, treated both as rather unstructured flexible regions.

Application

-DNMT1 -Lysozyme

Verification of patterns

Conclusion

-Statistical evidence -No preference for helix types -Many different non-homologous proteins.

References

[5] Efimov, a V. Standard structures in proteins. Prog. Biophys. Mol. Biol. 60, 201-2–39 (1993).

[6] Donate, L. E., Rufino, S. D., Canard, L. H. & Blundell, T. L. Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: a database for modeling and prediction. Protein Sci. 5, 2600-26–16 (1996).

[7] Bonet, J. et al. ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res. 42, D315-D31–9 (2014).

[8] Fiser, a et al. Modeling of loops in protein structures. Protein science : a publication of the Protein Society 9, 1753-73 (2000).

[9] Fiser, A. & Sali, A. ModLoop: Automated modeling of loops in protein structures. Bioinformatics 19, 2500-2501 (2003).

Team:Heidelberg/pages/Linker Modeling

From 2014.igem.org

Contents