Skip to contents

Computes the Best Linear Predictor (BLP) of the Conditional Average Treatment Effect (CATE), an estimand introduced by Chernozhukov et al. (2025) to test whether machine learning predictions capture meaningful treatment effect heterogeneity.

This function works with both ensemble_hte_fit (uses predicted ITE) and ensemble_pred_fit (uses predicted Y). For prediction fits, a treatment variable must be specified.

This function implements the multiple-split estimation strategy developed by Fava (2025), which combines predictions from multiple machine learning algorithms into an ensemble and averages BLP estimates across M repetitions of K-fold cross-fitting to improve statistical power.

Usage

blp(
  ensemble_fit,
  outcome = NULL,
  treatment = NULL,
  prop_score = NULL,
  controls = NULL,
  subset = NULL
)

Arguments

ensemble_fit

An object of class ensemble_hte_fit from ensemble_hte() or ensemble_pred_fit from ensemble_pred().

outcome

Either:

  • NULL (default): uses the same outcome as in the ensemble function

  • Character string: column name in the data used in the ensemble function

  • Numeric vector: custom outcome variable (must have same length as data)

This allows computing BLP for a different outcome than the one used for prediction.

treatment

For ensemble_pred_fit only. The treatment variable:

  • Character string: column name in the data used in ensemble_pred()

  • Numeric vector: binary treatment variable (must have same length as data)

Ignored for ensemble_hte_fit (uses the treatment from the fit).

prop_score

For ensemble_pred_fit only. Propensity score:

  • NULL (default): estimated as mean of treatment variable

  • Character string: column name in the data used in the ensemble function

  • Numeric value: constant propensity score for all observations

  • Numeric vector: observation-specific propensity scores

For ensemble_hte_fit, uses the propensity score from the fit.

controls

Character vector of control variable names to include in regression. These must be column names present in the data argument used when calling the ensemble function.

subset

Which observations to use for the BLP analysis. Options:

  • NULL (default): uses all observations (or training obs for ensemble_pred_fit when using the default outcome with subset training)

  • "train": uses only training observations (for ensemble_pred_fit only)

  • "all": explicitly uses all observations

  • Logical vector: TRUE/FALSE for each observation (must have same length as data)

  • Integer vector: indices of observations to include (1-indexed)

This allows evaluating BLP on a subset of observations. Note that BLP requires observations from both treatment and control groups in the subset.

Value

An object of class `blp_results` containing:

  • estimates: data.table with BLP estimates averaged across repetitions

  • outcome: outcome variable used

  • targeted_outcome: original outcome from ensemble fitting

  • fit_type: "hte" or "pred" depending on input

  • controls: control variables used (if any)

  • M: number of repetitions

  • call: the function call

Estimation Procedure

For each repetition \(m = 1, \ldots, M\):

  1. The ensemble predictions (ITEs or predicted Y) from repetition \(m\) are used as regressors. These predictions were generated by cross-fitting in ensemble_hte() or ensemble_pred(): for each fold \(k\), the model was trained on all folds except \(k\) and predicted on fold \(k\), so each observation's prediction is out-of-sample.

  2. A single weighted least squares regression is run on all observations: $$Y_i = \alpha + \beta_1 W_{1i} + \beta_2 W_{2i} + \varepsilon_i$$ where \(W_{1i} = D_i - e(X_i)\) and \(W_{2i} = (D_i - e(X_i))(\hat{s}(X_i) - \bar{\hat{s}})\), with \(e(X_i)\) the propensity score and \(\hat{s}(X_i)\) the predicted ITE (or predicted Y). Weights are \(1 / (e(X_i)(1 - e(X_i)))\).

  3. HC1 heteroskedasticity-robust standard errors are computed (or cluster-robust SEs when individual_id was specified in the ensemble fit).

The final reported estimates and standard errors are the simple averages of the per-repetition estimates and standard errors across all \(M\) repetitions.

Interpretation:

  • \(\beta_1\) (ATE): estimates the Average Treatment Effect.

  • \(\beta_2\) (HET): a significant \(\beta_2\) indicates that the ML predictions capture meaningful treatment effect heterogeneity.

References

Chernozhukov, V., Demirer, M., Duflo, E., & Fernández-Val, I. (2025). Fisher–Schultz Lecture: Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India. Econometrica, 93(4), 1121-1164.

Fava, B. (2025). Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators. arXiv preprint arXiv:2511.04957.

Examples

# \donttest{
data(microcredit)
covars <- c("age", "gender", "education", "hhinc_yrly_base",
            "css_creditscorefinal")
dat <- microcredit[, c("hhinc_yrly_end", "treat", covars)]

fit <- ensemble_hte(
  hhinc_yrly_end ~ ., treatment = treat, data = dat,
  prop_score = microcredit$prop_score,
  algorithms = c("lm", "grf"), M = 3, K = 3
)
#> Warning: Some propensity scores are below 0.20 or above 0.80. This package is designed for randomized controlled trials (RCTs), where propensity scores are typically well-balanced. Extreme propensity scores may indicate an observational study or a heavily unbalanced design. Please verify your experimental design.
result <- blp(fit)
print(result)
#> BLP Results (Best Linear Predictor of CATE)
#> ============================================
#> 
#> Fit type: HTE (ensemble_hte)
#> Outcome analyzed: hhinc_yrly_end
#> Repetitions: 3
#> 
#> Coefficients:
#>   beta1 (ATE): Average Treatment Effect
#>   beta2 (HET): Heterogeneity loading (significant = ML captures heterogeneity)
#> 
#>     Term    Estimate   Std.Error   t value    Pr(>|t|)
#>   ----------------------------------------------------
#>    beta1     1614.59     1689.40      0.96       0.339 
#>    beta2       -1.05        0.76     -1.38       0.167 
#> 
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# }