IC Colloquium : Sparse Bayesian nonparametric models for genomic data analysis

Event details
Date | 18.03.2013 |
Hour | 16:15 › 17:30 |
Speaker |
David Knowles, Stanford University IC faculty candidate |
Location | |
Category | Conferences - Seminars |
Abstract
Motivated by the desire to understand gene regulatory networks we develop two Bayesian nonparametric models which find modules of co-regulated genes from transcriptomic data. The first, Dirichlet Process Variable Clustering (DPVC), partitions genes into disjoint clusters, whereas the second, Nonparametric Sparse Factor Analysis (NSFA), allows genes to belong to an arbitrary number of modules. The superior predictive performance of the later model suggests that multiple membership more closely resembles the true nature of gene regulatory networks. We extend DPVC to allow similar but different modules in different data views, such as cell types, and extend NSFA to a multitask regression setting where our aim is to predict the sensitivity of cancer cell lines to therapeutic compounds using genetic and molecular characteristics. While we use genomic data analysis problems to motivate these models, they have much wider applicability and correspond to canonical analyses such as variable clustering, dimensionality reduction, and multitask learning.
Biography
I am a post-doctoral researcher with Daphne Koller in the Computer Science Department at Stanford University. I did my PhD with Zoubin Ghahramani in the Machine Learning group of the Cambridge University Engineering Department, during which I worked part-time at Microsoft Research Cambridge developing Infer.NET, a probabilistic inference framework. Prior to my PhD I obtained a masters in Bioinformatics and Systems Biology from Imperial College London. My undergraduate degree at the University of Cambridge comprised two years of Physics before switching to Engineering to complete an MEng with Professor Ghahramani. My research involves both the development of novel machine learning methods and their application to challenging data analysis problems in biology.
Motivated by the desire to understand gene regulatory networks we develop two Bayesian nonparametric models which find modules of co-regulated genes from transcriptomic data. The first, Dirichlet Process Variable Clustering (DPVC), partitions genes into disjoint clusters, whereas the second, Nonparametric Sparse Factor Analysis (NSFA), allows genes to belong to an arbitrary number of modules. The superior predictive performance of the later model suggests that multiple membership more closely resembles the true nature of gene regulatory networks. We extend DPVC to allow similar but different modules in different data views, such as cell types, and extend NSFA to a multitask regression setting where our aim is to predict the sensitivity of cancer cell lines to therapeutic compounds using genetic and molecular characteristics. While we use genomic data analysis problems to motivate these models, they have much wider applicability and correspond to canonical analyses such as variable clustering, dimensionality reduction, and multitask learning.
Biography
I am a post-doctoral researcher with Daphne Koller in the Computer Science Department at Stanford University. I did my PhD with Zoubin Ghahramani in the Machine Learning group of the Cambridge University Engineering Department, during which I worked part-time at Microsoft Research Cambridge developing Infer.NET, a probabilistic inference framework. Prior to my PhD I obtained a masters in Bioinformatics and Systems Biology from Imperial College London. My undergraduate degree at the University of Cambridge comprised two years of Physics before switching to Engineering to complete an MEng with Professor Ghahramani. My research involves both the development of novel machine learning methods and their application to challenging data analysis problems in biology.
Links
Practical information
- Informed public
- Free
Contact
- Christine Moscioni