Chapter 7. Synthetic Control

7.1 The problem DiD cannot solve

Consider the shape of an evaluation request that hits a DFI economist’s desk regularly. A single country passes a national rural credit reform in a single year. The Minister of Finance wants to know what it did. The available data are annual household consumption series for every regional peer going back a decade. Open the toolkit and an awkward feature appears immediately: there is one treated unit. Just one. The country reformed; nobody else did, or at least nobody else did at the same time.

Difference-in-differences (Chapter 4) needs a credible control group whose pre-trend looks like the treated unit’s pre-trend. With one treated country, that “group” collapses to whichever donor you pick, and the answer becomes a function of taste. Pick Tanzania and you get one number. Pick Zambia and you get another. Pick the average of all SADC countries and you get a third. None of these is defensible, because the choice is yours and not the data’s.

Synthetic Control (SC) is the answer to this exact situation. Instead of choosing one comparison country or averaging blindly, let the data choose a weighted combination of donor countries such that the weighted average reproduces the treated country’s pre-reform behavior. That weighted combination is the synthetic counterfactual. The treatment effect is the gap between actual and synthetic in the post-period.

The method has a clean origin story. Abadie and Gardeazabal (2003) introduced it to estimate the economic cost of terrorism in the Basque Country, building a synthetic Basque region from a weighted combination of other Spanish regions. Abadie, Diamond, and Hainmueller (2010) refined the framework and applied it to California’s Proposition 99 tobacco control law, building a synthetic California from 38 other US states. Those two papers are the canon. Everything since builds on them.

For a blended-finance career this matters because the unit of analysis is almost always one country or one province. DFIs do not run randomized national reforms. They evaluate single-country interventions where DiD is hopeless and SC is the only credible quantitative counterfactual.

7.2 The intuition

Hold the math for a moment and think about what you actually want.

You have one treated unit with an outcome path Y_{1t} over time. You have J donor units (untreated, similar enough to be plausible comparisons) with outcome paths Y_{jt}. Before treatment, both are observed. After treatment, Y_{1t} is observed under the policy but the counterfactual without the policy is what you need. That counterfactual is what SC builds.

The construction is this: find non-negative weights w_j summing to one such that

\sum_{j=2}^{J+1} w_j Y_{jt} \approx Y_{1t} \quad \text{for } t \text{ before treatment}

If the weighted donor average tracks the treated unit through the entire pre-period, there is a reasonable claim that it would have continued to track in the post-period absent treatment. The post-period gap

\hat\tau_t = Y_{1t} - \sum_{j=2}^{J+1} \hat w_j Y_{jt}

is the estimated treatment effect at time t.

Two features of the weight constraints earn their keep. First, w_j \geq 0 rules out extrapolation. Synthetic California is some convex combination of Utah, Nevada, Colorado, and so on. It is not California minus 0.4 times New York plus 1.2 times Texas. Negative weights would let the synthetic match the pre-period perfectly by exploiting cancellations that have no economic meaning. Second, \sum_j w_j = 1 keeps the synthetic on the same scale as the treated unit. You are picking a point inside the convex hull of donor outcome paths, not outside it.

The trade-off is the trade-off of any convex combination: if the treated unit sits outside the convex hull of donors, the closest interior point is still far away, and the pre-period fit will be poor. We come back to this as the convex hull problem in section 7.5.

7.3 The math, compactly

Let T_0 be the number of pre-treatment periods. Let X_1 be a k \times 1 vector of pre-treatment characteristics of the treated unit. These can be lagged outcomes, covariates, or both. Let X_0 be the k \times J matrix of the same characteristics for the donor pool. The weight vector W = (w_2, \dots, w_{J+1})' solves

\hat W = \arg\min_W \|X_1 - X_0 W\|_V \quad \text{s.t. } w_j \geq 0, \ \sum_j w_j = 1

where \|x\|_V = \sqrt{x'Vx} and V is a k \times k positive semidefinite matrix that decides how much each characteristic matters in the fit.

The second-stage choice of V matters and is sometimes overlooked. Abadie, Diamond, and Hainmueller propose choosing V to minimize the mean squared prediction error of the outcome over the pre-period:

\hat V = \arg\min_V \sum_{t=1}^{T_0} \left(Y_{1t} - \sum_j \hat w_j(V) \cdot Y_{jt}\right)^2

This is a nested optimization: outer loop over V, inner loop over W given V. Software handles it, but the user should know it is happening because pre-period fit can change a lot when V is fixed at the identity versus when it is data-driven.

A useful intuition for V: it is telling you which characteristics in X_1 predict the outcome best. Characteristics that move with the outcome get more weight; noise variables get downweighted.

7.4 Inference by placebo

SC’s standard errors are not standard. With one treated unit, there is no cross-sectional variation to bootstrap over and the classical sampling distribution does not apply. Abadie, Diamond, and Hainmueller introduced placebo inference, which is now the default.

The procedure:

  1. Treat each donor in turn as if it were the treated unit. Run SC on it using the remaining donors as its donor pool.
  2. Record the post-period gap \hat\tau^{placebo}_j(t) for each placebo run.
  3. Compare the actual treated unit’s gap \hat\tau_1(t) to the distribution of placebo gaps.

The p-value for a one-sided test is the rank of |\hat\tau_1| in the pooled distribution of |\hat\tau_j| across all donors, divided by J+1. If California’s tobacco effect is larger in magnitude than 36 of 38 placebo state effects, the p-value is roughly 2/39 \approx 0.05.

One refinement matters. Donors with very poor pre-period fit will have huge post-period gaps simply because the synthetic was never tracking them well, not because of any treatment. Filter placebos by pre-period RMSPE, or equivalently report the ratio

\text{RMSPE ratio} = \frac{\text{post-RMSPE}}{\text{pre-RMSPE}}

and rank treated and placebos by this ratio. A high ratio means the gap opened up after treatment relative to the noise level before, which is what the analyst actually cares about.

7.5 Common traps

A blunt summary. SC is easy to run and easy to misuse. These are the failure modes that get manuscripts rejected.

Donor pool selection bias. You decide which countries enter the donor pool. Cherry-picking donors that resemble the treated unit can engineer almost any pre-period fit. The discipline is to specify the donor pool ex ante from a transparent rule (same region, same income tier, same data availability) and not to drop donors that produce inconvenient placebos.

Overfitting in the pre-period. With many covariates and a short pre-period, the optimization can produce weights that fit pre-period outcomes perfectly while telling you nothing about the post-period counterfactual. The cleanest defense: keep the covariate set lean, use lagged outcomes sparingly, and report which donors got positive weight. A synthetic built from one or two donors is suspicious.

Pre-period must be long enough. Ten pre-treatment periods is a common minimum. Five is rarely defensible. The reason is mechanical: with fewer pre-periods than donors, the optimization has more parameters than observations and the fit becomes trivial. Long pre-periods let the user see whether the weighted donor average actually tracks the treated unit’s idiosyncratic movements, not just its average level.

The convex hull problem. If the treated unit’s pre-period characteristics are extreme relative to all donors (lowest GDP per capita, highest inflation, unusual sectoral composition), no convex combination of donors will match it. The optimization will return sparse weights concentrated on one or two donors, the fit will be poor, and any post-period gap is uninterpretable. Abadie’s (2021) review essay treats this as the central practical obstacle to SC and recommends checking it before estimation by inspecting where the treated unit sits in the donor distribution of each covariate.

Anticipation effects. If actors expected the reform and changed behavior three years ahead of implementation, the “pre-period” includes treatment-induced movement and the synthetic is fitted to contaminated data. Define the pre-period to end before the earliest plausible anticipation date, or do a sensitivity analysis varying the cutoff.

Interpolation bias. Even with positive weights summing to one, the synthetic combines economies that may have very different production structures. If the outcome is sensitive to that structure (rural credit responding differently in coffee-exporting versus cotton-exporting economies), the weighted average may be a Frankenstein economy with no real-world analog.

7.6 Modern extensions

The literature has not stood still. Three extensions matter for what is currently being published.

Augmented SC (Ben-Michael, Feller, Rothstein 2021). When pre-period fit is poor, augmented SC adds a regression-based bias correction. The idea: run SC, then run a ridge regression on the residuals to capture systematic pre-period gaps, then apply that correction to the post-period. The result is an estimator that reduces to SC when pre-period fit is good and reduces to a regression-adjusted DiD when it is poor. The R package is augsynth.

Generalized SC and matrix completion (Athey, Bayati, Doudchenko, Imbens, Khosravi 2021). Classical SC handles one treated unit and one treatment date. Real interventions are often staggered (different countries adopt the same reform at different times) or multi-unit (a pilot rolled out across several districts). Matrix completion treats the counterfactual outcome as a missing value in a low-rank matrix and uses regularized regression to fill it in. The method generalizes both SC and two-way fixed effects DiD as special cases.

Synthetic DiD (Arkhangelsky, Athey, Hirshberg, Imbens, Wager 2021). SC matches levels; DiD matches trends. Synthetic DiD combines them by reweighting both units (like SC) and time periods (like a time-weighted DiD). The estimator is robust when neither pure SC nor pure DiD identifying assumptions hold cleanly and is rapidly becoming the new default in applied papers. The R package is synthdid.

For a paper aimed at a current audience, the author’s recommendation is to estimate the headline number with classical SC, report augmented SC and synthetic DiD as robustness checks, and frame the discussion around which method’s assumptions are most defensible for the specific setting.

7.7 Worked example: rural credit in Portugal

To make this concrete, take a hypothetical. Portugal passes a national rural credit reform in 2015 expanding subsidized lending to farm households. The outcome of interest is log rural-household consumption per capita. Available data: annual series 2005-2020 for Portugal and 12 EU donor countries (Spain, Italy, Greece, France, Ireland, Belgium, Netherlands, Germany, Austria, Denmark, Finland, Sweden). Pre-period is 2005-2014 (ten years). Post-period is 2015-2020 (six years).

The estimand is the average treatment effect on Portugal’s rural consumption from 2015 onward.

7.7.1 R using Synth

library(Synth)
library(tidyverse)

# Assume `data` is a long-format data frame with columns:
# country (string), country_id (integer), year, log_rural_cons,
# gdp_pc, ag_share, pop_rural_share, baseline_credit

# Step 1: data prep
dataprep_out <- dataprep(
  foo = as.data.frame(data),
  predictors = c("gdp_pc", "ag_share", "pop_rural_share", "baseline_credit"),
  predictors.op = "mean",
  time.predictors.prior = 2005:2014,
  special.predictors = list(
    list("log_rural_cons", 2008, "mean"),
    list("log_rural_cons", 2011, "mean"),
    list("log_rural_cons", 2014, "mean")
  ),
  dependent = "log_rural_cons",
  unit.variable = "country_id",
  unit.names.variable = "country",
  time.variable = "year",
  treatment.identifier = 1,            # Portugal's country_id
  controls.identifier = c(2:13),       # the 12 donors
  time.optimize.ssr = 2005:2014,
  time.plot = 2005:2020
)

# Step 2: estimate weights
synth_out <- synth(dataprep_out)

# Step 3: inspect weights
synth.tables <- synth.tab(dataprep.res = dataprep_out, synth.res = synth_out)
print(synth.tables$tab.w)    # donor weights
print(synth.tables$tab.v)    # predictor weights V

# Step 4: plot actual vs synthetic
path.plot(synth.res = synth_out, dataprep.res = dataprep_out,
          Ylab = "log rural cons. per capita", Xlab = "year",
          Legend = c("Portugal", "Synthetic Portugal"),
          Legend.position = "bottomright")
abline(v = 2015, lty = 2)

# Step 5: gap plot
gaps.plot(synth.res = synth_out, dataprep.res = dataprep_out,
          Ylab = "Gap in log rural cons.", Xlab = "year",
          Main = "Portugal minus Synthetic Portugal")
abline(v = 2015, lty = 2)

# Step 6: placebo inference (loop over donors)
placebo_gaps <- list()
for (j in 2:13) {
  dp_j <- dataprep(
    foo = as.data.frame(data),
    predictors = c("gdp_pc", "ag_share", "pop_rural_share", "baseline_credit"),
    predictors.op = "mean",
    time.predictors.prior = 2005:2014,
    special.predictors = list(
      list("log_rural_cons", 2008, "mean"),
      list("log_rural_cons", 2011, "mean"),
      list("log_rural_cons", 2014, "mean")
    ),
    dependent = "log_rural_cons",
    unit.variable = "country_id",
    unit.names.variable = "country",
    time.variable = "year",
    treatment.identifier = j,
    controls.identifier = setdiff(c(1:13), j),
    time.optimize.ssr = 2005:2014,
    time.plot = 2005:2020
  )
  s_j <- synth(dp_j)
  placebo_gaps[[j]] <- dp_j$Y1plot - (dp_j$Y0plot %*% s_j$solution.w)
}

7.7.2 R using augsynth

library(augsynth)

# Augmented SC with ridge correction
asyn <- augsynth(log_rural_cons ~ treated,
                 unit = country, time = year,
                 data = data, t_int = 2015,
                 progfunc = "Ridge", scm = TRUE)

summary(asyn)             # ATT, pre-period RMSE, jackknife inference
plot(asyn)                # gap plot with confidence band

The progfunc = "Ridge" argument tells augsynth to use ridge regression for the bias correction. scm = TRUE keeps the SC weighting layer on top.

7.7.3 R using synthdid

library(synthdid)

# Reshape to wide matrix: rows = units, cols = years
setup <- panel.matrices(data, unit = "country", time = "year",
                        outcome = "log_rural_cons", treatment = "treated")

# Estimate synthetic DiD
tau_sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
summary(tau_sdid)

# Plot
plot(tau_sdid, overlay = 1)

synthdid returns three estimators (SC, DiD, synthetic DiD) by toggling internal options. Report all three side by side so the reader can see how much the weighting and time-weighting choices move the answer.

7.7.4 Stata using synth

* Install once
ssc install synth, replace
net install st0470_1, from(http://www.stata-journal.com/software/sj18-1/) replace

* Reshape and set tsset
tsset country_id year

* Estimate
synth log_rural_cons ///
    gdp_pc ag_share pop_rural_share baseline_credit ///
    log_rural_cons(2008) log_rural_cons(2011) log_rural_cons(2014), ///
    trunit(1) trperiod(2015) ///
    xperiod(2005(1)2014) mspeperiod(2005(1)2014) ///
    keep(synth_portugal.dta) replace fig

* Placebo loop
forvalues j = 2/13 {
    synth log_rural_cons ///
        gdp_pc ag_share pop_rural_share baseline_credit ///
        log_rural_cons(2008) log_rural_cons(2011) log_rural_cons(2014), ///
        trunit(`j') trperiod(2015) ///
        xperiod(2005(1)2014) mspeperiod(2005(1)2014) ///
        keep(placebo_`j'.dta) replace
}

7.7.5 Stata using synth2

* Install (community-contributed; supports placebo inference and plots natively)
ssc install synth2, replace

synth2 log_rural_cons ///
    gdp_pc ag_share pop_rural_share baseline_credit ///
    log_rural_cons(2008) log_rural_cons(2011) log_rural_cons(2014), ///
    trunit(1) trperiod(2015) ///
    preperiod(2005(1)2014) postperiod(2015(1)2020) ///
    placebo(unit cut(2)) ///
    nested allopt

synth2 extends the original synth with built-in placebo inference, robustness diagnostics, and cleaner plotting. The cut(2) option filters placebo donors with pre-period RMSPE more than twice the treated unit’s, which is the standard filter to avoid contaminating the placebo distribution with poor-fit cases.

7.8 Reporting checklist

When writing this up for a DFI client or a peer-reviewed journal, the checklist is short and non-negotiable.

  1. Pre-period fit (RMSPE). Report the root mean squared prediction error of the synthetic on the pre-period. Plot actual versus synthetic for the full pre-period so readers can see the fit visually.
  2. Weights table. List the donors that received positive weight and the value of each weight. Sparse weights concentrated on one or two donors are a red flag; report them honestly. Also report the V matrix or the predictor balance table.
  3. Placebo distribution. Plot the gap for the treated unit alongside gaps for all donors run as placebos. Filter placebos with RMSPE more than (typically) 5 times the treated unit’s pre-period RMSPE, and report the filter rule.
  4. RMSPE ratio. Report the post-period to pre-period RMSPE ratio for the treated unit and where it sits in the placebo ranking. This is the closest equivalent to a p-value.
  5. Sensitivity to donor pool. Drop each high-weight donor in turn and report whether the estimate is stable. If removing one donor flips the sign or doubles the magnitude, the result is fragile.
  6. Sensitivity to method. Report classical SC, augmented SC, and synthetic DiD side by side. If they disagree, discuss why.
  7. Convex hull check. Show where the treated unit sits in the donor distribution of each covariate. If it is at the edge, flag it as a limitation.

Hit those seven things and the paper is already ahead of two-thirds of published SC work.

7.9 Career relevance

When a DFI hires an evaluator for a single-country reform, there is no randomized rollout, no multiple treated countries, and the parallel-trends assumption that DiD requires does not hold credibly. SC is the only credible quantitative counterfactual in that setting, and it is what serious development economists use when the unit of analysis is a country or a province.

Three settings where this comes up:

  • National policy adoption. A single low- or middle-income country (Mozambique, Cape Verde, East Timor, Bhutan) adopts a reform and the estimand is the effect on a macro-level outcome (rural consumption, school enrollment, child mortality). Donor pool is regional peers.
  • Sub-national pilot. A pilot intervention is rolled out in one district or municipality before national scale-up. Donor pool is other districts in the same country. This is where SC pairs naturally with administrative data on the pilot district.
  • Concession or PPP impact. A single port, mine, or special economic zone enters operation and the question is the effect on the surrounding region. Donor pool is comparable regions without such an installation.

Pair this with what came out of DiD (Chapter 4): the two can sometimes be combined by using SC to build a synthetic control unit and then running a DiD-style estimate on the actual versus synthetic series. Synthetic DiD formalizes this.

A pragmatic note: the politics of single-country evaluation can be hostile. If a finance minister commissioned a reform evaluation and the answer is “the reform did nothing,” the method will be attacked. SC is defensible because the synthetic recipe is transparent (the weights table is the recipe) and the placebo distribution is reproducible. DiD with a hand-picked control country is not. Choose the method that survives scrutiny, not the one that gives the loudest answer.

7.10 References

Foundational:

  • Abadie, A., and Gardeazabal, J. (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review 93(1): 113-132.
  • Abadie, A., Diamond, A., and Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490): 493-505.
  • Abadie, A., Diamond, A., and Hainmueller, J. (2015). “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59(2): 495-510.
  • Abadie, A. (2021). “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature 59(2): 391-425. (Read this one before designing any SC study.)

Modern extensions:

  • Ben-Michael, E., Feller, A., and Rothstein, J. (2021). “The Augmented Synthetic Control Method.” Journal of the American Statistical Association 116(536): 1789-1803.
  • Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116(536): 1716-1730.
  • Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review 111(12): 4088-4118.

Applied development economics using SC:

  • Cavallo, E., Galiani, S., Noy, I., and Pantano, J. (2013). “Catastrophic Natural Disasters and Economic Growth.” Review of Economics and Statistics 95(5): 1549-1561. (SC applied to country-level GDP after large disasters; useful template for single-country external shocks.)
  • Billmeier, A., and Nannicini, T. (2013). “Assessing Economic Liberalization Episodes: A Synthetic Control Approach.” Review of Economics and Statistics 95(3): 983-1001. (SC applied to growth after trade liberalization in 30 countries; the canonical applied dev econ example.)
  • Pinotti, P. (2015). “The Economic Costs of Organised Crime: Evidence from Southern Italy.” Economic Journal 125(586): F203-F232. (Sub-national application; weights tell the story.)
  • Acemoglu, D., Johnson, S., Kermani, A., Kwak, J., and Mitton, T. (2016). “The Value of Connections in Turbulent Times: Evidence from the United States.” Journal of Financial Economics 121(2): 368-391. (Firm-level SC; transfers cleanly to PPP and concession settings.)

That is what you need to know to run SC competently in a blended-finance setting. Next chapter we move from a treated unit to treated parts of a distribution: quantile regression.