Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gradient Boosted Decision Trees (simonwardjones.co.uk)
150 points by simonwardjones on Oct 6, 2020 | hide | past | favorite | 7 comments


Very well written article! For further reading, I would also recommend diving into the conceptual overview of the gradient boosting framework LightGBM. It features some interesting optimization techniques for better overall performance.

https://github.com/microsoft/LightGBM/blob/master/docs/Featu...


see also: Friedman's 1999 paper - "Greedy Function Approximation: A Gradient Boosting Machine" https://projecteuclid.org/download/pdf_1/euclid.aos/10132034...


Nice write-up! Any info under what circumstances gradient boosted trees behave better versus traditional random forests?


(this answer from limited practical experience 10 years ago, but at least the theory doesn't go out of date):

random forest is less prone to over fitting as each tree in the ensemble is independent, if the base tree doesn't over fit then a random Forest of them also will not over fit. Whereas trees in a boosted model are not independent, boosting trains a sequence of models where model n depends on the previous models.

This is a double edged sword: you can probably get better predictive accuracy with boosting if you have enough data & have controls to prevent over fitting. Whereas a random forest is much more idiot proof to over fitting but it will not perform as well as a boosted model trained but not overfit on a large dataset.


Why not also predict the error of the error prediction, and correct for that too? /s


You mock, but overfitting is is how most results are realized.


The whole series is pretty damn good so far, despite the use, bordering abuse, of emojis, but that's a personal take.

What other ML topics do you plan to cover, Simon?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: