Learning causal effects from many randomized experiments using regularized instrumental variables
- CML

We study how to meta-analyze a large collection of randomized experiments (eg. those done during routine improvements of an online service) to learn general causal relationships. We focus on the case where the number of tests is large, the analyst has no metadata about the context of the test and only has access to summary statistics (and not the raw data). We apply instrumental variable analysis in the form of two stage least squares regression (TSLS). We show that a form of L0 regularization in the first stage can help improve learning by reducing bias in some situations. This is even true in some situations where standard tests (eg. first stage F > 10) would suggest that two stage least squares is hopeless. We propose a cross-validation procedure to set the regularization parameter.
View on arXiv