Decision Trees and CLT's: Inference and Machine Learning

Thumbnail

Event details

Date 03.06.2022
Hour 15:1517:00
Speaker Giles Hooker, Department of Statistics, UC Berkeley
Location Online
Category Conferences - Seminars
Event Language English

This talk develops methods of statistical inference based around ensembles of decision trees: bagging, random forests, and boosting. Recent results have shown that when the bootstrap procedure in bagging methods is replaced by sub-sampling, predictions from these methods can be analyzed using the theory of U-statistics which have a limiting normal distribution. Moreover, the limiting variance that can be estimated within the sub-sampling structure.

Using this result, we can compare the predictions made by a model learned with a feature of interest, to those made by a model learned without it and ask whether the differences between these could have arisen by chance. By evaluating the model at a structured set of points we can also ask whether it differs significantly from an additive model. We demonstrate these results in an application to citizen-science data collected by Cornell's Laboratory of Ornithology.

Given time, we will examine recent developments that extend distributional results to boosting-type estimators. Boosting allows trees to be incorporated into more structured regression such as additive or varying coefficient models and often outperforms bagging by reducing bias.

Practical information

  • Informed public
  • Free

Organizer

  • Victor Panaretos

Contact

  • Maroussia Schaffner

Share