Learning spatial amino acid contacts from many homologous protein sequences

Event details
Date | 26.02.2013 |
Hour | 16:15 › 17:15 |
Speaker | Prof. Erik Aurell, KTH Stockholm |
Location | |
Category | Conferences - Seminars |
Spatially proximate amino acids in a protein can be asumed to co-evolve and a protein's three-dimensional (3D) structure hence
leave an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations would be
important in biology, as the number of known protein sequences is much larger and grows much faster than the number
of known protein structures. Within this task lies a statistical inference problem, rooted in the following: correlation between two
sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed
correlation is not enough to guarantee proximity.
An approach to separating direct from indirect interactions is to learn a plausible probabilistic model from the data, and then score putative interactions by the corresponding terms in the model. In the context of protein sequences and learning the a model of at most pair-wise interactions (a Potts model) this approach has been referred to as direct-coupling analysis.
The computational tasks involved are not trivial as in these problems a maximum likelihood approach is unfeasible, and one must resort to approximations.
I will discuss this field focusing on our recent result that the pseudolikelihood method somewhat outperforms other approaches to the direct-coupling analysis.
This is joint work with Magnus Ekeberg, Cecilia Lövkvist, Yueheng Lan and Martin Weigt published in Phys. Rev. E 87, 012707 (2013), URL: http://link.aps.org/doi/10.1103/PhysRevE.87.012707
Code implementing the pseudolikelihood method for these problems is available at http://plmdca.csc.kth.se/.
leave an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations would be
important in biology, as the number of known protein sequences is much larger and grows much faster than the number
of known protein structures. Within this task lies a statistical inference problem, rooted in the following: correlation between two
sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed
correlation is not enough to guarantee proximity.
An approach to separating direct from indirect interactions is to learn a plausible probabilistic model from the data, and then score putative interactions by the corresponding terms in the model. In the context of protein sequences and learning the a model of at most pair-wise interactions (a Potts model) this approach has been referred to as direct-coupling analysis.
The computational tasks involved are not trivial as in these problems a maximum likelihood approach is unfeasible, and one must resort to approximations.
I will discuss this field focusing on our recent result that the pseudolikelihood method somewhat outperforms other approaches to the direct-coupling analysis.
This is joint work with Magnus Ekeberg, Cecilia Lövkvist, Yueheng Lan and Martin Weigt published in Phys. Rev. E 87, 012707 (2013), URL: http://link.aps.org/doi/10.1103/PhysRevE.87.012707
Code implementing the pseudolikelihood method for these problems is available at http://plmdca.csc.kth.se/.
Practical information
- Informed public
- Free
Organizer
- IPG
Contact
- Prof. Rüdiger Urbanke