IPG Seminar: "Decoding Genetic Variations: Communications-Inspired Haplotype Assembly"

Event details
Date | 05.12.2014 |
Hour | 14:15 › 15:15 |
Speaker | Haris Vikalo, University of Texas at Austin |
Location | |
Category | Conferences - Seminars |
High-throughput DNA sequencing technologies allow fast and affordable sequencing of individual genomes and thus enable unprecedented studies of genetic variations. The complete information about variations in the genome of an individual is provided by haplotypes, ordered collections of single nucleotide polymorphisms.
Knowledge of haplotypes is instrumental in finding genes associated with diseases, drug development and evolutionary studies; however, their reconstruction is computationally challenging (NP-hard). Our key observation is that the minimum error-correction formulation of the haplotype assembly problem is identical to the task of deciphering a coded message received over a noisy channel – a classical problem in the mature field of communication theory. Exploiting this connection, we develop novel methods that explore the tradeoffs between performance and complexity of haplotype assembly. Motivated by sphere decoding ideas, we propose a novel branch-and-bound haplotyping algorithm and find its expected complexity, demonstrating that the optimal haplotype assembly, often considered impractical, is in fact practically feasible for haplotype blocks of moderate length. For long haplotype blocks, we reformulate the problem as a semi-definite program and exploit its structural features to derive a fast and accurate heuristic solution. Finally, an information-theoretic analysis of haplotype assembly allows us to establish its fundamental
performance limits.
Knowledge of haplotypes is instrumental in finding genes associated with diseases, drug development and evolutionary studies; however, their reconstruction is computationally challenging (NP-hard). Our key observation is that the minimum error-correction formulation of the haplotype assembly problem is identical to the task of deciphering a coded message received over a noisy channel – a classical problem in the mature field of communication theory. Exploiting this connection, we develop novel methods that explore the tradeoffs between performance and complexity of haplotype assembly. Motivated by sphere decoding ideas, we propose a novel branch-and-bound haplotyping algorithm and find its expected complexity, demonstrating that the optimal haplotype assembly, often considered impractical, is in fact practically feasible for haplotype blocks of moderate length. For long haplotype blocks, we reformulate the problem as a semi-definite program and exploit its structural features to derive a fast and accurate heuristic solution. Finally, an information-theoretic analysis of haplotype assembly allows us to establish its fundamental
performance limits.
Practical information
- Informed public
- Free
Organizer
- IPG Seminar - Michael Gastpar (LINX)
Contact
- Michael Gastpar ([email protected])