Decision Trees and CLT's: Inference and Machine Learning

Event details

Date	03.06.2022
Hour	15:15 › 17:00
Speaker	Giles Hooker, Department of Statistics, UC Berkeley
Location	Online
Category	Conferences - Seminars
Event Language	English

This talk develops methods of statistical inference based around ensembles of decision trees: bagging, random forests, and boosting. Recent results have shown that when the bootstrap procedure in bagging methods is replaced by sub-sampling, predictions from these methods can be analyzed using the theory of U-statistics which have a limiting normal distribution. Moreover, the limiting variance that can be estimated within the sub-sampling structure.

Using this result, we can compare the predictions made by a model learned with a feature of interest, to those made by a model learned without it and ask whether the differences between these could have arisen by chance. By evaluating the model at a structured set of points we can also ask whether it differs significantly from an additive model. We demonstrate these results in an application to citizen-science data collected by Cornell's Laboratory of Ornithology.

Given time, we will examine recent developments that extend distributional results to boosting-type estimators. Boosting allows trees to be incorporated into more structured regression such as additive or varying coefficient models and often outperforms bagging by reducing bias.

Practical information

Informed public
Free

Organizer

Victor Panaretos

Contact

Maroussia Schaffner

Export Event

Event broadcasted in

Send a reminder