Linear regression with unmatched data: A deconvolution perspective
Event details
Date | 15.09.2023 |
Hour | 11:00 › 12:00 |
Speaker | Prof. Fadoua Ballabdaoui, ETH Zürich, Switzerland |
Location |
ME C2 405
|
Category | Conferences - Seminars |
Event Language | English |
Abstract:
We consider the regression problem where the response $Y\in\RR$ and the covariate $X\in\RR^d$ for $d\geq 1$ are unmatched. Under this scenario, one does not have access to pairs of observations from the distribution of $(X, Y)$, but instead separate datasets $\{Y_i\}_{i=1}^n$ and $\{X_j\}_{j=1}^m$, possibly collected from different sources. We study this problem assuming that the regression function is linear and the noise distribution is known or can be estimated. We introduce an estimator of the regression vector based on deconvolution and demonstrate its consistency and asymptotic normality under an identifiability assumption. We show how the method can be used in semi-supervised learning, in case one has access to a small sample of matched pairs $(X_k, Y_k)$. Some simulation results as well as application to real datasets will be considered to illustrate the theory.
Bio:
I obtained my PhD in Statistics in 2014 from the Department of Statistics of University of Washington, Seattle
After a postdoc at the Institute of Mathematical Stochastics, University of Goettingen, I was appointed in 2016 as an Assistant Professor at Universite Paris-Dauphine, where I served until 2015.
Currently I am an Adjunct Professor at the Seminar for Statistics, Department of Mathematics, ETH Zurich. I am specialized in Non-parametric Statistics, estimation under shape constraints and mixture models. I have grown a strong interest in AI techniques, with application in Art (especially poetry and music).
We consider the regression problem where the response $Y\in\RR$ and the covariate $X\in\RR^d$ for $d\geq 1$ are unmatched. Under this scenario, one does not have access to pairs of observations from the distribution of $(X, Y)$, but instead separate datasets $\{Y_i\}_{i=1}^n$ and $\{X_j\}_{j=1}^m$, possibly collected from different sources. We study this problem assuming that the regression function is linear and the noise distribution is known or can be estimated. We introduce an estimator of the regression vector based on deconvolution and demonstrate its consistency and asymptotic normality under an identifiability assumption. We show how the method can be used in semi-supervised learning, in case one has access to a small sample of matched pairs $(X_k, Y_k)$. Some simulation results as well as application to real datasets will be considered to illustrate the theory.
Bio:
I obtained my PhD in Statistics in 2014 from the Department of Statistics of University of Washington, Seattle
After a postdoc at the Institute of Mathematical Stochastics, University of Goettingen, I was appointed in 2016 as an Assistant Professor at Universite Paris-Dauphine, where I served until 2015.
Currently I am an Adjunct Professor at the Seminar for Statistics, Department of Mathematics, ETH Zurich. I am specialized in Non-parametric Statistics, estimation under shape constraints and mixture models. I have grown a strong interest in AI techniques, with application in Art (especially poetry and music).
Practical information
- General public
- Free
Organizer
- Prof. Maryam Kamgarpour