Decision Trees and CLT's: Inference and Machine Learning
Event details
Date | 03.06.2022 |
Hour | 15:15 › 17:00 |
Speaker | Giles Hooker, Department of Statistics, UC Berkeley |
Location | Online |
Category | Conferences - Seminars |
Event Language | English |
This talk develops methods of statistical inference based around ensembles of decision trees: bagging, random forests, and boosting. Recent results have shown that when the bootstrap procedure in bagging methods is replaced by sub-sampling, predictions from these methods can be analyzed using the theory of U-statistics which have a limiting normal distribution. Moreover, the limiting variance that can be estimated within the sub-sampling structure.
Using this result, we can compare the predictions made by a model learned with a feature of interest, to those made by a model learned without it and ask whether the differences between these could have arisen by chance. By evaluating the model at a structured set of points we can also ask whether it differs significantly from an additive model. We demonstrate these results in an application to citizen-science data collected by Cornell's Laboratory of Ornithology.
Given time, we will examine recent developments that extend distributional results to boosting-type estimators. Boosting allows trees to be incorporated into more structured regression such as additive or varying coefficient models and often outperforms bagging by reducing bias.
Practical information
- Informed public
- Free
Organizer
- Victor Panaretos
Contact
- Maroussia Schaffner