This repository contains replication files for Farrell, Liang, and Misra (2026), arXiv:2010.14694, version 4.
A separate Python implementation is available here, though it has not been tested for this repository:
https://deep-inference.readthedocs.io/en/latest/index.html
The repository includes replication code and example datasets used to illustrate methods related to post-machine-learning inference, nonlinear binscatter methods, empirical applications, and simulation evidence.
The current data dictionary covers two empirical datasets and one simulation study:
| Application | Main data or target | Source |
|---|---|---|
| Bertrand et al. consumer credit marketing experiment | adcontentworth_qjecsv.csv |
Bertrand, Karlan, Mullainathan, Shafir, and Zinman (2010), The Quarterly Journal of Economics |
| American Community Survey zip-code-level data | CCFF_2024_ACS_2.csv |
Cattaneo, Crump, Farrell, and Feng, “Nonlinear Binscatter Methods” |
| Simulation study | mu = E[beta(X)] in a linear-in-treatment model |
Monte Carlo design for Farrell, Liang, and Misra (2025) |
File name: adcontentworth_qjecsv.csv
This dataset comes from Bertrand, Karlan, Mullainathan, Shafir, and Zinman (2010), “What’s Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment,” The Quarterly Journal of Economics, 125(1):263–306.
The data are from a large-scale field experiment run on behalf of a financial institution in South Africa. Consumers were sent marketing materials for short-term loans in which both the interest rate and features of the advertising content were randomized.
| File | Description |
|---|---|
FLM2_Bertrand_step0_functions.R |
Defines the structured neural-network estimator for the Bertrand application and the H functions used to define semiparametric inference targets of the form mu = E[H(...)]. |
FLM2_Bertrand_step1_fittingDNNs.R |
Fits the first-stage structured neural networks for the Bertrand loan application data and saves cross-fitted DNN model objects for later inference. |
FLM2_Bertrand_step2_InferenceStep.R |
Loads the saved first-stage DNN fits, estimates the Lambda(x) objects needed for the influence-function correction, and computes semiparametric inference for targets such as marginal effects and optimal profits. |
FLM2_Bertrand_valueOfStructure.R |
Demonstrates the value of structural restrictions by comparing random forests, neural networks, and a structural binary-choice logit model for demand estimation and profit optimization. |
| Variable | Description |
|---|---|
offer4 |
Randomly assigned monthly offer interest rate for the four-month loan, measured in percentage-point units, for example 8.2 for 8.2% per month. |
speak_trt |
Indicator that the mailer included the language-affinity message “We speak [client’s language]” for eligible clients whose primary language was not English. |
stripany |
Indicator that the mailer included a rate-description strip or banner saying either “A special rate for you” or “A low rate for you.” |
dphoto_none |
Indicator that the mailer did not include a person’s photograph. |
dphoto_black |
Indicator that the mailer included a photograph of a Black person, as opposed to another photo race category or no photo. |
dphoto_female |
Indicator that the mailer included a photograph of a woman, as opposed to a male photograph; no-photo cases are separately captured by dphoto_none. |
prize |
Indicator that the mailer mentioned the promotional cell-phone raffle. |
oneln_trt |
Indicator that the example-loan table showed one example loan rather than four example loans. |
use_any |
Indicator that the suggested-use line gave only the general message that the client could use the cash or loan for anything, rather than naming a specific use such as school, debt repayment, appliance purchase, or home repair. |
intshown |
Indicator that the example-loan table displayed the interest rate in addition to the monthly repayment information. |
comploss_n |
Indicator that the competitor-rate comparison was framed as a loss, for example “If you borrow elsewhere, you will pay … more,” rather than as a gain. |
comp_n |
Indicator that the mailer included any comparison to a competitor or outside rate, as opposed to no competitor-rate comparison. |
waved3 |
Indicator for the later mailer/randomization wave, corresponding to the October mailing wave rather than the September wave. |
dormancy |
Number of months since the client’s most recent prior loan from the lender. |
trcount |
Number of previous loans the client had taken from the lender. |
female |
Indicator that the client is female. |
race |
Client race category, used both as a covariate and for photo-race matching; the paper reports African, Indian, White, and Mixed/“Colored” categories. |
nspeakeligible |
Indicator that the client was eligible for the language-affinity treatment because the client’s primary language was not English. |
File name: CCFF_2024_ACS_2.csv
This dataset comes from Cattaneo, Crump, Farrell, and Feng, “Nonlinear Binscatter Methods,” arXiv:2407.15276.
The original data construction and replication repository are linked from the nppackages replication page:
https://nppackages.github.io/replication/
The data are obtained from the American Community Survey using five-year survey estimates beginning in 2013 and ending in 2017, available from the U.S. Census Bureau. The analyses are performed at the zip code tabulation area level for the United States, excluding Puerto Rico.
| File | Description |
|---|---|
FLM2_ACS(1).R |
Replicates the CCFF binscatter-style figure for uninsured rates by income and population-density group, then applies the FLM2 estimation and inference procedure to the ACS application. |
| Variable | Description |
|---|---|
uninsuredRate |
Percent of people without health insurance. |
perCapitaIncome |
Per capita income. |
idxpopdens |
Indicator that divides states into low- and high-population-density groups, using 100 people per square mile as the cutoff; state population density is defined as average population per square mile using Census Bureau data. |
The simulation study evaluates inference for mu = E[beta(X)] in a linear-in-treatment model of the form Y = alpha(X) + beta(X)T + epsilon, with one-dimensional covariates and treatment. The design compares full-sample and cross-fitted neural-network estimators, automatic-differentiation corrections, auto-DML, and generalized random forest benchmarks under configurable treatment assignment and heteroskedasticity.
| File | Description |
|---|---|
FLM2_simuls_1_functions.R |
Defines the simulation data-generating process, neural-network estimators, auto-DML routines, one-replication simulation function, and helper functions for summarizing results. |
FLM2_simuls_2_run.R |
Sets the simulation design parameters, runs the Monte Carlo replications in parallel, saves the resulting simulation objects, and writes LaTeX tables and diagnostic output. |
Bertrand, Marianne, Dean Karlan, Sendhil Mullainathan, Eldar Shafir, and Jonathan Zinman. 2010. “What’s Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment.” The Quarterly Journal of Economics 125(1):263–306.
Cattaneo, Matias D., Richard K. Crump, Max H. Farrell, and Yingjie Feng. “Nonlinear Binscatter Methods.” arXiv:2407.15276.
Farrell, Max H., Tengyuan Liang, and Sanjog Misra. 2025. arXiv:2010.14694, version 3.