Team:TU Darmstadt/Modeling

From 2012.igem.org

Revision as of 13:15, 10 September 2012 by S jager (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Home \|	Team \|	Official Team Profile \|	Project \|	Parts Submitted to the Registry \|	Modeling \|	Notebook \|	Safety \|	Attributions

This is a template page. READ THESE INSTRUCTIONS.

You are provided with this team page template with which to start the iGEM season. You may choose to personalize it to fit your team but keep the same "look." Or you may choose to take your team wiki to a different level and design your own wiki. You can find some examples HERE.

You MUST have all of the pages listed in the menu below with the names specified. PLEASE keep all of your pages within your teams namespace.

If you choose to include a Modeling page, please write about your modeling adventures here. This is not necessary but it may be a nice list to include.

Modeling

Homologie Modeling

While our proteins are functionally described in literature and during the IGEM competition, no structures are available in the protein data bank. For further work and visualizations protein structures are indispensible. We used Yasara Structure [1]⁠ to calculate 3-dimensional structures of our proteins we used within the IGEM.

Workflow

Description how our Yasara scripts calculates homology model[7]:

Sequence is PSI-BLASTed against Uniprot [2]⁠
Calculation of a a position-specific scoring matrix (PSSM) from related sequences
Using the PSSM to search the PDB for potential modeling templates
The Templates are ranked based on the alignment score and the structural quality[3]⁠
Deriving additional information’s for template and target (prediction of secondary structure, structure-based alignment correction by using SSALN scoring matrices [4]⁠.
A graph of the side-chain rotamer network is built, dead-end elimination is used to find an initial rotamer solution in the context of a simple repulsive energy function [5]⁠
The loop-network is optimized using a high amount of different orientations
Side-chain rotamers are fine-tuned considering electrostatic and knowledge-based packing interactions as well as solvation effects.
An unrestrained high-resolution refinement with explicit solvent molecules is run, using the latest knowledge-based force fields[6]⁠.

Application

All these steps are performed to every template used for the modeling approach. For our project we set the maximum amount of templates to 20. Every derived structure is evaluated using an average per-residue quality Z-scores. At least a hybrid model is built containing the best regions of all predictions. This procedure make prediction’s accurate and thus more realistic.

Results

PnB-Esterase

AroY

TphA1

TphA2

TphA3

TphA2

References

[1] E. Krieger, G. Koraimann, and G. Vriend, “Increasing the precision of comparative models with YASARA NOVA--a self-parameterizing force field.,” Proteins, vol. 47, no. 3, pp. 393–402, 2002.

[2] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.,” Nucleic Acids Res, vol. 25, no. 17, pp. 3389–3402, Sep. 1997.

[3] R. W. Hooft, G. Vriend, C. Sander, and E. E. Abola, “Errors in protein structures.,” Nature, vol. 381, no. 6580. Nature Publishing Group, p. 272, 1996.

[4] D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices,” Journal of Molecular Biology, vol. 292, no. 2, pp. 195–202, 1999.

[5] A. A. Canutescu, A. A. Shelenkov, and R. L. Dunbrack, “A graph-theory algorithm for rapid protein side-chain prediction.,” Protein Science, vol. 12, no. 9, pp. 2001–2014, 2003.

[6] E. Krieger, K. Joo, J. Lee, J. Lee, S. Raman, J. Thompson, M. Tyka, D. Baker, and K. Karplus, “Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8.,” Proteins, vol. 77 Suppl 9, no. June, pp. 114–122, 2009.

[7] http://www.yasara.org/homologymodeling.htm

Information Theoretical Analysis

Information Theory

Entropy

Claude Shannon created a new measurement approach of uncertainty of a random variable X. This measurement is called Shannon’s entropy H [1] which is measured in bit, if a logarithm to the base 2 is used. p(x) denotes the probability mass function of a random variable X.

Mutual Information

In information theory, Mutual information (MI) is a correlations measure of two random variables X and Y . H(X) and H(Y ) are the Shannon entropy values of the random variables X and Y. H(X, Y ) is the two-point entropy. Moreover , the MI quanti?es the amount of information of variable X by knowing Y and vice versa.

Application of MI to sequence Alignments

It is well known that the MI can be used to measure co-evolution signals in multiple sequence alignments (MSA)[2] [3] . An MSA serves as a comparison of three or more sequences used to investigate the functional or evolutionary homology of amino acid or nucleotide sequences. The MI of an MSA can be computed with the following equation derived from the Kullback-Leibler-Divergence (DKL):

with p(x) and p( y) being the frequency counts of symbols in column X and Y of the MSA. The joint frequency describe the occurrence for the amino acids xi and yj(p(x, y)) and Q is the set of Symbols derived from the corresponding alphabet (DNA or Protein). The result of these calculations is a symmetric matrix M which includes all combined MI values for any two columns in an MSA. A dependency of two columns acids shows high MI values.

Normalisation

A standard score (Z-score) indicates how many standard deviations a value differs from the mean of a normal distribution. MI dependent Z-scores can be calculated with a shufﬂe-null model, where the symbols in MSA column are shufﬂed and every dependencies of the column pairs are eliminated. The expectation value for the shufﬂe-null model is described by E(Mi j) and its corresponding variance by Var(Mi j) [4].