I am a PhD Candidate in the Economics Department at Northwestern University, and a job market candidate for the 2025/26 season. My main field is econometrics, and I also work on empirical development economics, with a focus on machine learning, causal inference, and microcredit.

I am broadly interested in how applied researchers can use predictive algorithms to answer new questions and improve empirical analyses, and I develop statistical methods for machine learning applications that rely on weak or verifiable assumptions. My econometrics research is motivated by empirical challenges in development economics, while my empirical work builds on theoretical insights and contextual knowledge to identify and answer questions. In my job market paper, I address a common scenario in research, policy, and industry: using the same dataset to train a predictive model and evaluate its accuracy or fairness, a task typically done by splitting the sample into two parts. I develop valid confidence intervals based on estimators that average multiple sample splits, effectively using the entire sample for training and testing, and improving reproducibility, all under weak assumptions.

Job Market Paper

Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators [pdf]

As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate subsamples to estimate the model and to evaluate it. However, this approach has two drawbacks, since each task uses only part of the data, and different splits can lead to widely different estimates. Averaging across multiple splits, I develop an inference approach that uses more data for training, uses the entire sample for testing, and improves reproducibility. I address the statistical dependence from reusing observations across splits by proving a new central limit theorem for a large class of split-sample estimators under arguably mild and general conditions. Importantly, I make no restrictions on model complexity or convergence rates. I show that confidence intervals based on the normal approximation are valid for many applications, but may undercover in important cases of interest, such as comparing the performance between two models. I develop a new inference approach for such cases, explicitly accounting for the dependence across splits. Moreover, I provide a measure of reproducibility for p-values obtained from split-sample estimators. Finally, I apply my results to two important problems in development and public economics: predicting poverty and learning heterogeneous treatment effects in randomized experiments. I show that my inference approach with repeated cross-fitting achieves better power than existing alternatives, often enough to reveal statistical significance that would otherwise be missed.

Working Papers

Predicting the Distribution of Treatment Effects via Covariate-Adjustment, with an Application to Microcredit [arxiv] [Presentation MEG 2024]
Best Student Paper Award at the 32nd Midwest Econometrics Group Annual Conference (2024)
Important questions for impact evaluation require knowledge not only of average effects, but of the distribution of treatment effects. The inability to observe individual counterfactuals makes answering these empirical questions challenging. I propose an inference approach for points of the distribution of treatment effects by incorporating predicted counterfactuals through covariate adjustment. I provide finite-sample valid inference using sample-splitting, and asymptotically valid inference using cross-fitting, under arguably weak conditions. Revisiting five randomized controlled trials on microcredit that reported null average effects, I find important distributional impacts, with some individuals helped and others harmed by the increased credit access.
Algorithmic Bias in Microcredit: Consequences of Data-Driven Lending Practices
(with Susan Athey, Dean Karlan, Adam Osman, and Jonathan Zinman)
Draft available soon
New machine learning methods may help lenders increase profits through enhanced targeting and underwriting decisions. But are those who are most profitable to the bank also those most often targeted by lenders with double-bottom lines, i.e. pro-social motivations? Or those who benefit the most from receiving access to credit? Using data from three randomized controlled trials on microcredit and machine learning algorithms, we demonstrate that lenders can increase their own profits through algorithm-driven lending decisions. However, the most profitable clients are often wealthier and more educated, shifting lending away from traditionally disadvantaged groups, thus demonstrating a trade-off with the targeting aspirations of pro-social lenders. Lender profits increase in two of the three sites, by 18% and 27%, if they relax any social objectives with respect to reaching lower income households. If policymakers required that targeting strategies leave the average pre-loan income of borrowers unchanged, then average profit increases for banks drop from around 22% to 9% in the restricted targeting case. These findings highlight the critical tensions arising from algorithmic credit allocation, emphasizing that as predictive technologies evolve, the trade-offs between profitability and social inclusion may intensify.

Work in Progress

Is Participant Feedback Predictive of Impact?
(with Gharad Bryan, Dean Karlan, Isabel Oñate, and Christopher Udry)
What Can We Learn from Harmonizing and Analyzing RCTs of Grant and Training Programs to Promote Entrepreneurship?
(with Florian de Bundel, Dean Karlan, William Parienté, and Christopher Udry)

Publications

Probabilistic Nearest Neighbors Classification [pdf] [R Package]
(with Paulo C. Marques F. and Hedibert F. Lopes) Entropy, 2024, 26(1), 39.
Analysis of the currently established Bayesian nearest neighbors classification model points to a connection between the computation of its normalizing constant and issues of NP-completeness. An alternative predictive model constructed by aggregating the predictive distributions of simpler nonlocal models is proposed, and analytic expressions for the normalizing constants of these nonlocal models are derived, ensuring polynomial time computation without approximations. Experiments with synthetic and real datasets showcase the predictive performance of the proposed predictive model.
The Illusion of the Illusion of Sparsity: An exercise in prior sensitivity [pdf] [code]
(with Hedibert F. Lopes) Brazilian Journal of Probability and Statistics, 2021, Vol. 35, No. 4, 699-720.
The emergence of Big Data raises the question of how to model economic relations when there is a large number of possible explanatory variables. We revisit the issue by comparing the possibility of using dense or sparse models in a Bayesian approach, allowing for variable selection and shrinkage. More specifically, we discuss the results reached by Giannone, Lenza and Primiceri (2020) through a “Spike-and-Slab” prior, which suggest an “illusion of sparsity” in Economics datasets, as no clear patterns of sparsity could be detected. We make a further revision of the posterior distributions of the model, and propose three experiments to evaluate the robustness of the adopted prior distribution. We find that the pattern of sparsity is sensitive to the prior distribution of the regression coefficients, and present evidence that the model indirectly induces variable selection and shrinkage, which suggests that the “illusion of sparsity” could be, itself, an illusion. Code is available on Github.

Bruno Fava

Job Market Paper

Working Papers

Work in Progress

Publications