I am a Data Science Postdoctoral Fellow at Stanford University, working with Guido Imbens and Susan Athey. In Fall 2027, I will join CEMFI as an Assistant Professor. My main field is econometrics, and I also work on empirical development economics, with interests in machine learning, artificial intelligence, causal inference, and microcredit. I obtained my PhD in Economics from Northwestern University.

I am broadly interested in how applied researchers can use predictive algorithms to answer new questions and improve empirical analyses, and I develop statistical methods for machine learning applications that rely on weak or verifiable assumptions.

Job Market Paper

Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators [pdf] [R Package] [code guide]

As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate subsamples to estimate the model and to evaluate it. However, this approach has two drawbacks, since each task uses only part of the data, and different splits can lead to widely different estimates. Averaging across multiple splits, I develop an inference approach that uses more data for training, uses the entire sample for testing, and improves reproducibility. I address the statistical dependence from reusing observations across splits by proving a new central limit theorem for a large class of split-sample estimators under arguably mild and general conditions. Importantly, I make no restrictions on model complexity or convergence rates. I show that confidence intervals based on the normal approximation are valid for many applications, but may undercover in important cases of interest, such as comparing the performance between two models. I develop a new inference approach for such cases, explicitly accounting for the dependence across splits. Moreover, I provide a measure of reproducibility for p-values obtained from split-sample estimators. Finally, I apply my results to two important problems in development and public economics: predicting poverty and learning heterogeneous treatment effects in randomized experiments. I show that my inference approach with repeated cross-fitting achieves better power than existing alternatives, often enough to reveal statistical significance that would otherwise be missed.

Working Papers

Predicting the Distribution of Treatment Effects via Covariate-Adjustment, with an Application to Microcredit [arxiv] [Presentation MEG 2024]
Best Student Paper Award at the 32nd Midwest Econometrics Group Annual Conference (2024)
Important questions for impact evaluation require knowledge not only of average effects, but of the distribution of treatment effects. The inability to observe individual counterfactuals makes answering these empirical questions challenging. I propose an inference approach for points of the distribution of treatment effects by incorporating predicted counterfactuals through covariate adjustment. I provide finite-sample valid inference using sample-splitting, and asymptotically valid inference using cross-fitting, under arguably weak conditions. Revisiting five randomized controlled trials on microcredit that reported null average effects, I find important distributional impacts, with some individuals helped and others harmed by the increased credit access.
Profits and Social Impacts: Complements vs. Tradeoffs for Lenders in Three Countries [pdf]
(with Susan Athey, Dean Karlan, Adam Osman, and Jonathan Zinman)
The canonical approach to corporate governance posits a tradeoff between maximizing shareholder profits and maximizing other aspects of shareholder welfare such as social impact. Advances in machine learning exacerbate this tradeoff: algorithms can maximize profits via prediction, whereas targeting impact requires the more difficult task of estimating causal effects. We estimate these trade-offs using data from randomized microcredit approvals in South Africa, the Philippines, and Bosnia. Two of the three lenders could have increased profits by 10–14% through machine learning-based targeting, but doing so would have reduced access for groups microcredit seeks to help, including female and lower-income borrowers, suggesting a tradeoff between lender profit and distributional objectives. However, we find no evidence that maximizing profits would lead to lower treatment effects on borrower income; if anything, lender profits are increasing in those effects, suggesting complementarity between lender profit and economic efficiency objectives. Simulating effects of policy constraints on lender targeting, we find that holding borrower average income fixed at baseline levels would reduce potential lender profit by two-thirds. These findings highlight the importance of quantifying tradeoffs and complementarities when deciding what to maximize.

Work in Progress

Is Participant Feedback Predictive of Impact?
(with Gharad Bryan, Dean Karlan, Isabel Oñate, and Christopher Udry)
What Can We Learn from Harmonizing and Analyzing RCTs of Grant and Training Programs to Promote Entrepreneurship?
(with Florian de Bundel, Dean Karlan, William Parienté, and Christopher Udry)

Publications

Probabilistic Nearest Neighbors Classification [pdf] [R Package]
(with Paulo C. Marques F. and Hedibert F. Lopes) Entropy, 2024, 26(1), 39.
Analysis of the currently established Bayesian nearest neighbors classification model points to a connection between the computation of its normalizing constant and issues of NP-completeness. An alternative predictive model constructed by aggregating the predictive distributions of simpler nonlocal models is proposed, and analytic expressions for the normalizing constants of these nonlocal models are derived, ensuring polynomial time computation without approximations. Experiments with synthetic and real datasets showcase the predictive performance of the proposed predictive model.
The Illusion of the Illusion of Sparsity: An exercise in prior sensitivity [pdf] [code]
(with Hedibert F. Lopes) Brazilian Journal of Probability and Statistics, 2021, Vol. 35, No. 4, 699-720.
The emergence of Big Data raises the question of how to model economic relations when there is a large number of possible explanatory variables. We revisit the issue by comparing the possibility of using dense or sparse models in a Bayesian approach, allowing for variable selection and shrinkage. More specifically, we discuss the results reached by Giannone, Lenza and Primiceri (2020) through a “Spike-and-Slab” prior, which suggest an “illusion of sparsity” in Economics datasets, as no clear patterns of sparsity could be detected. We make a further revision of the posterior distributions of the model, and propose three experiments to evaluate the robustness of the adopted prior distribution. We find that the pattern of sparsity is sensitive to the prior distribution of the regression coefficients, and present evidence that the model indirectly induces variable selection and shrinkage, which suggests that the “illusion of sparsity” could be, itself, an illusion. Code is available on Github.

Bruno Fava

Job Market Paper

Working Papers

Work in Progress

Publications