7
0

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Abstract

Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS\unicodex2014\unicode{x2014}or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings\unicodex2014\unicode{x2014}and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates\unicodex2014\unicode{x2014}potentially exceeding the bias of endogenous OLS.

View on arXiv
@article{lennon2025_2505.13422,
  title={ Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation },
  author={ Connor Lennon and Edward Rubin and Glen Waddell },
  journal={arXiv preprint arXiv:2505.13422},
  year={ 2025 }
}
Comments on this paper