Outcome-guided multi-view bayesian clustering for integrative omic data analysis

Thumbnail

Event details

Date 03.11.2023
Hour 15:1517:00
Speaker Paul Kirk, University of Cambridge
Location
Category Conferences - Seminars
Event Language English
Introduction: Although the challenges presented by high dimensional data in the context of regression are well-known and the subject of much current research, comparatively little work has been done on this in the context of clustering. In this setting, the key challenge is that often only a small subset of the covariates provides a relevant stratification of the population. Identifying relevant strata can be particularly challenging when dealing with high-dimensional datasets, in which there may be many variables that provide no information whatsoever about population structure, or - perhaps worse - in which there may be (potentially large) variable subsets that define irrelevant stratifications. For example, when dealing with genetic data, there may be some genetic variants that allow us to group patients in terms of disease risk, but others that would provide completely irrelevant stratifications (e.g. which would group patients together on the basis of eye or hair colour).
Methods and Results: Bayesian profile regression is an outcome-guided model-based clustering approach that makes use of a response in order to guide the clustering toward relevant stratifications. Here we show how this approach can be extended to the "multiview" setting, in which different groups of variables ("views") define different stratifications. We present some results in the context of breast cancer subtyping to illustrate how the approach can be used to perform integrative clustering of multiple 'omics datasets.
Conclusions: When there are multiple clustering structures present in data, existing (single view) clustering approaches can fail to recover the most relevant clustering structure, even when guided by an appropriate response. Moreover, traditional variable selection approaches for clustering do not necessarily improve matters, since they tend to select variables that define the dominant clustering structure, regardless of whether or not it is associated with a response of interest. Real molecular datasets can and do possess multiple clustering structures, and our outcome-guided multi-view model can allow both relevant and irrelevant structures to be identified.
References:
Molitor, et al. Bayesian profile regression with an application to the National Survey of Children's Health. Biostatistics. 2010. 
Kirk, Pagani, Richardson. Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine. arXiv 2023
Keywords: High dimensional data, Bayesian clustering, Omics
 

Practical information

  • Informed public
  • Free

Organizer

  • Yoav Zemel

Contact

  • Maroussia Schaffner

Share