Statistically robust, scalable and distributed inference methods for large scale data

Event details
Date | 02.05.2019 |
Hour | 13:30 › 14:30 |
Speaker | Prof. Visa Koivunen, Aalto University, Finland |
Location | |
Category | Conferences - Seminars |
In this talk we address the problem of performing statistical inference for large scale data sets. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. First, we present a scalable, statistically robust and computationally efficient bootstrap method, compatible with distributed processing and storage systems. Bootstrap resamples are constructed with smaller number of distinct data points on multiple disjoint subsets of data, similarly to the bag of little bootstrap method (BLB). A computationally efficient fixed-point estimation equation is analytically solved via a smart approximation stemming from the Fast and Robust Bootstrap method (FRB). Fixed point estimation equations lend themselves to highly robust and low complexity statistical estimators in finding point estimates, confidence intervals and performing variable selection for large scale data sets. Sparse solutions can be promoted, too. We also propose a method for performing inference on fields observed by a massive number of spatially distributed sensors in IoT. The approach is nonparametric in a sense that it learns the probability models. The actual inference is based on fusing p-values and multiple hypothesis testing controlling false discovery rates. A field may is clustered in homogeneous regions using an empirical Bayesian approach that takes the underlying spatial dependencies among the sensors into account. The clustering finds applications in characterizing radio spectrum, environmental monitoring, cyber-physical systems and agriculture.
Links
Practical information
- Informed public
- Free
Organizer
- Prof. Ali Sayed
Contact
- Stefan Vlaski, stefan.vlaski@epfl.ch