209

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Main:24 Pages
8 Figures
Bibliography:4 Pages
2 Tables
Appendix:8 Pages
Abstract

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS\unicodex2014\unicode{x2014}or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings\unicodex2014\unicode{x2014}and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates\unicodex2014\unicode{x2014}potentially exceeding the bias of endogenous OLS.

View on arXiv
Comments on this paper