Skip to content

maxhfarrell/FLM2

Repository files navigation

FLM2 Replication Files

This repository contains replication files for Farrell, Liang, and Misra (2026), arXiv:2010.14694, version 4.

A separate Python implementation is available here, though it has not been tested for this repository:

https://deep-inference.readthedocs.io/en/latest/index.html


Repository Overview

The repository includes replication code and example datasets used to illustrate methods related to post-machine-learning inference, nonlinear binscatter methods, empirical applications, and simulation evidence.

The current data dictionary covers two empirical datasets and one simulation study:

Application Main data or target Source
Bertrand et al. consumer credit marketing experiment adcontentworth_qjecsv.csv Bertrand, Karlan, Mullainathan, Shafir, and Zinman (2010), The Quarterly Journal of Economics
American Community Survey zip-code-level data CCFF_2024_ACS_2.csv Cattaneo, Crump, Farrell, and Feng, “Nonlinear Binscatter Methods”
Simulation study mu = E[beta(X)] in a linear-in-treatment model Monte Carlo design for Farrell, Liang, and Misra (2025)

Data Dictionary and Code Guide

Dataset 1: Bertrand et al. Consumer Credit Marketing Experiment

File name: adcontentworth_qjecsv.csv

This dataset comes from Bertrand, Karlan, Mullainathan, Shafir, and Zinman (2010), “What’s Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment,” The Quarterly Journal of Economics, 125(1):263–306.

The data are from a large-scale field experiment run on behalf of a financial institution in South Africa. Consumers were sent marketing materials for short-term loans in which both the interest rate and features of the advertising content were randomized.

Code Files

File Description
FLM2_Bertrand_step0_functions.R Defines the structured neural-network estimator for the Bertrand application and the H functions used to define semiparametric inference targets of the form mu = E[H(...)].
FLM2_Bertrand_step1_fittingDNNs.R Fits the first-stage structured neural networks for the Bertrand loan application data and saves cross-fitted DNN model objects for later inference.
FLM2_Bertrand_step2_InferenceStep.R Loads the saved first-stage DNN fits, estimates the Lambda(x) objects needed for the influence-function correction, and computes semiparametric inference for targets such as marginal effects and optimal profits.
FLM2_Bertrand_valueOfStructure.R Demonstrates the value of structural restrictions by comparing random forests, neural networks, and a structural binary-choice logit model for demand estimation and profit optimization.

Variables

Variable Description
offer4 Randomly assigned monthly offer interest rate for the four-month loan, measured in percentage-point units, for example 8.2 for 8.2% per month.
speak_trt Indicator that the mailer included the language-affinity message “We speak [client’s language]” for eligible clients whose primary language was not English.
stripany Indicator that the mailer included a rate-description strip or banner saying either “A special rate for you” or “A low rate for you.”
dphoto_none Indicator that the mailer did not include a person’s photograph.
dphoto_black Indicator that the mailer included a photograph of a Black person, as opposed to another photo race category or no photo.
dphoto_female Indicator that the mailer included a photograph of a woman, as opposed to a male photograph; no-photo cases are separately captured by dphoto_none.
prize Indicator that the mailer mentioned the promotional cell-phone raffle.
oneln_trt Indicator that the example-loan table showed one example loan rather than four example loans.
use_any Indicator that the suggested-use line gave only the general message that the client could use the cash or loan for anything, rather than naming a specific use such as school, debt repayment, appliance purchase, or home repair.
intshown Indicator that the example-loan table displayed the interest rate in addition to the monthly repayment information.
comploss_n Indicator that the competitor-rate comparison was framed as a loss, for example “If you borrow elsewhere, you will pay … more,” rather than as a gain.
comp_n Indicator that the mailer included any comparison to a competitor or outside rate, as opposed to no competitor-rate comparison.
waved3 Indicator for the later mailer/randomization wave, corresponding to the October mailing wave rather than the September wave.
dormancy Number of months since the client’s most recent prior loan from the lender.
trcount Number of previous loans the client had taken from the lender.
female Indicator that the client is female.
race Client race category, used both as a covariate and for photo-race matching; the paper reports African, Indian, White, and Mixed/“Colored” categories.
nspeakeligible Indicator that the client was eligible for the language-affinity treatment because the client’s primary language was not English.

Dataset 2: American Community Survey

File name: CCFF_2024_ACS_2.csv

This dataset comes from Cattaneo, Crump, Farrell, and Feng, “Nonlinear Binscatter Methods,” arXiv:2407.15276.

The original data construction and replication repository are linked from the nppackages replication page:

https://nppackages.github.io/replication/

The data are obtained from the American Community Survey using five-year survey estimates beginning in 2013 and ending in 2017, available from the U.S. Census Bureau. The analyses are performed at the zip code tabulation area level for the United States, excluding Puerto Rico.

Code Files

File Description
FLM2_ACS(1).R Replicates the CCFF binscatter-style figure for uninsured rates by income and population-density group, then applies the FLM2 estimation and inference procedure to the ACS application.

Variables

Variable Description
uninsuredRate Percent of people without health insurance.
perCapitaIncome Per capita income.
idxpopdens Indicator that divides states into low- and high-population-density groups, using 100 people per square mile as the cutoff; state population density is defined as average population per square mile using Census Bureau data.

Simulation Study

The simulation study evaluates inference for mu = E[beta(X)] in a linear-in-treatment model of the form Y = alpha(X) + beta(X)T + epsilon, with one-dimensional covariates and treatment. The design compares full-sample and cross-fitted neural-network estimators, automatic-differentiation corrections, auto-DML, and generalized random forest benchmarks under configurable treatment assignment and heteroskedasticity.

Code Files

File Description
FLM2_simuls_1_functions.R Defines the simulation data-generating process, neural-network estimators, auto-DML routines, one-replication simulation function, and helper functions for summarizing results.
FLM2_simuls_2_run.R Sets the simulation design parameters, runs the Monte Carlo replications in parallel, saves the resulting simulation objects, and writes LaTeX tables and diagnostic output.

References

Bertrand, Marianne, Dean Karlan, Sendhil Mullainathan, Eldar Shafir, and Jonathan Zinman. 2010. “What’s Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment.” The Quarterly Journal of Economics 125(1):263–306.

Cattaneo, Matias D., Richard K. Crump, Max H. Farrell, and Yingjie Feng. “Nonlinear Binscatter Methods.” arXiv:2407.15276.

Farrell, Max H., Tengyuan Liang, and Sanjog Misra. 2025. arXiv:2010.14694, version 3.

About

Replication for Farrell, Liang, Misra 2025 (arXiv:2010.14694)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages