I am a PhD Candidate in the Economics Department at Northwestern University, and a job market candidate for the 2025/26 season. My main field is econometrics, and I also work on empirical development economics, with a focus on machine learning, causal inference, and microcredit.
I am broadly interested in how applied researchers can use predictive algorithms to answer new questions and improve empirical analyses, and I develop statistical methods for machine learning applications that rely on weak or verifiable assumptions. My econometrics research is motivated by empirical challenges in development economics, while my empirical work builds on theoretical insights and contextual knowledge to identify and answer questions. In my job market paper, I address a common scenario in research, policy, and industry: using the same dataset to train a predictive model and evaluate its accuracy or fairness, a task typically done by splitting the sample into two parts. I develop valid confidence intervals based on estimators that average multiple sample splits, effectively using the entire sample for training and testing, and improving reproducibility, all under weak assumptions.
Job Market Paper
Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators [pdf]
As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate subsamples to estimate the model and to evaluate it. However, this approach has two drawbacks, since each task uses only part of the data, and different splits can lead to widely different estimates. Averaging across multiple splits, I develop an inference approach that uses more data for training, uses the entire sample for testing, and improves reproducibility. I address the statistical dependence from reusing observations across splits by proving a new central limit theorem for a large class of split-sample estimators under arguably mild and general conditions. Importantly, I make no restrictions on model complexity or convergence rates. I show that confidence intervals based on the normal approximation are valid for many applications, but may undercover in important cases of interest, such as comparing the performance between two models. I develop a new inference approach for such cases, explicitly accounting for the dependence across splits. Moreover, I provide a measure of reproducibility for p-values obtained from split-sample estimators. Finally, I apply my results to two important problems in development and public economics: predicting poverty and learning heterogeneous treatment effects in randomized experiments. I show that my inference approach with repeated cross-fitting achieves better power than existing alternatives, often enough to reveal statistical significance that would otherwise be missed.
Working Papers
- Predicting the Distribution of Treatment Effects via Covariate-Adjustment, with an Application to Microcredit [arxiv] [Presentation MEG 2024]
Best Student Paper Award at the 32nd Midwest Econometrics Group Annual Conference (2024) - Algorithmic Bias in Microcredit: Consequences of Data-Driven Lending Practices
(with Susan Athey, Dean Karlan, Adam Osman, and Jonathan Zinman)
Draft available soon
Work in Progress
- Is Participant Feedback Predictive of Impact?
(with Gharad Bryan, Dean Karlan, Isabel Oñate, and Christopher Udry) - What Can We Learn from Harmonizing and Analyzing RCTs of Grant and Training Programs to Promote Entrepreneurship?
(with Florian de Bundel, Dean Karlan, William Parienté, and Christopher Udry)
Publications
- Probabilistic Nearest Neighbors Classification [pdf] [R Package]
(with Paulo C. Marques F. and Hedibert F. Lopes) Entropy, 2024, 26(1), 39. - The Illusion of the Illusion of Sparsity: An exercise in prior sensitivity [pdf] [code]
(with Hedibert F. Lopes) Brazilian Journal of Probability and Statistics, 2021, Vol. 35, No. 4, 699-720.
