Chatpter 5 (cont.). Advanced Causal Inference in Julia
import Pkg;
Pkg.add(["DiffinDiffs", "RegressionDiscontinuity", "SynthControl", "Dates", "Plots"]);
[32m[1m Resolving[22m[39m package versions...
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Project.toml`
[32m[1m No Changes[22m[39m to `~/.julia/environments/v1.11/Manifest.toml`
Instrumental Variable
using DataFrames, RDatasets, FixedEffectModels
df = dataset("plm", "Cigar")
first(df, 5)
5×9 DataFrame
Int64
Int64
Float64
Float64
Float64
Float64
Float64
Float64
Float64
1
1
63
28.6
3383.0
2236.5
30.6
1558.3
93.9
26.1
2
1
64
29.8
3431.0
2276.7
31.0
1684.07
95.4
27.5
3
1
65
29.8
3486.0
2327.5
31.5
1809.84
98.5
28.9
4
1
66
31.5
3524.0
2369.7
32.4
1915.16
96.4
29.5
5
1
67
31.6
3533.0
2393.7
33.4
2023.55
95.5
29.6
For example, if we have an instrumental variable Price
predicting CPI
which predicts Sales
(theoretically incorrect, just for illustration), the formula here should be:
# DV ~ exogenous variables + (endogenous variables ~ instrumental variables) + fe(fixedeffect variable)
reg(df, @formula(Sales ~ NDI + (CPI ~ Price ) + fe(State)))
FixedEffectModel
=======================================================================
Number of obs: 1380 Converged: true
dof (model): 2 dof (residuals): 1333
R²: -0.327 R² adjusted: -0.373
F-statistic: 42.695 P-value: 0.000
F-statistic (first stage): 39.4447 P-value (first stage): 0.000
R² within: -3.259 Iterations: 1
=======================================================================
Estimate Std. Error t-stat Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────────
NDI 0.0271309 0.00634835 4.2737 <1e-04 0.0146771 0.0395848
CPI -3.71621 0.817692 -4.54475 <1e-05 -5.32031 -2.1121
=======================================================================
Difference-in-difference
using DiffinDiffs
hrs = DiffinDiffsBase.exampledata("hrs")
3280×11 VecColumnTable:
[1m Row [0m│[1m hhidpn [0m[1m wave [0m[1m wave_hosp [0m[1m oop_spend [0m[1m riearnsemp [0m[1m rwthh [0m[1m male [0m[1m spouse [0m[1m[0m ⋯
│[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m Float64 [0m[90m Float64 [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m[0m ⋯
──────┼─────────────────────────────────────────────────────────────────────────
1 │ 1 10 10 6532.91 6.37159e5 4042 0 0 ⋯
2 │ 1 8 10 1326.93 3.67451e5 3975 0 0 ⋯
3 │ 1 11 10 1050.33 74130.5 3976 0 0 ⋯
4 │ 1 9 10 979.418 84757.4 3703 0 0 ⋯
5 │ 1 7 10 5498.68 1.66128e5 5295 0 0 ⋯
6 │ 2 8 8 41504.0 0.0 5187 0 1 ⋯
7 │ 2 7 8 3672.86 0.0 4186 0 1 ⋯
8 │ 2 10 8 1174.19 0.0 3729 0 1 ⋯
9 │ 2 11 8 6909.59 0.0 3906 0 1 ⋯
10 │ 2 9 8 1130.1 0.0 5453 0 1 ⋯
11 │ 3 9 10 2773.45 8475.74 4419 0 1 ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
3270 │ 654 10 10 2876.32 45511.3 4372 0 0 ⋯
3271 │ 655 7 9 1910.74 25476.5 7682 0 1 ⋯
3272 │ 655 11 9 12402.5 34721.6 9203 0 1 ⋯
3273 │ 655 8 9 1530.0 45000.0 8461 0 1 ⋯
3274 │ 655 9 9 7373.89 10359.2 9345 0 1 ⋯
3275 │ 655 10 9 673.568 38229.5 8420 0 1 ⋯
3276 │ 656 11 8 3020.78 0.0 1930 0 0 ⋯
3277 │ 656 8 8 2632.0 0.0 4810 0 0 ⋯
3278 │ 656 9 8 657.34 0.0 4768 0 0 ⋯
3279 │ 656 10 8 782.795 0.0 1909 0 0 ⋯
3280 │ 656 7 8 4182.39 0.0 4374 0 0 ⋯
r = @did(Reg, data = hrs, dynamic(:wave, -1), notyettreated(11),
vce = Vcov.cluster(:hhidpn), yterm = term(:oop_spend), treatname = :wave_hosp,
treatintterms = (), xterms = (fe(:wave) + fe(:hhidpn)))
──────────────────────────────────────────────────────────────────────
Summary of results: Regression-based DID
──────────────────────────────────────────────────────────────────────
Number of obs: 2624 Degrees of freedom: 14
F-statistic: 6.42 p-value: <1e-07
──────────────────────────────────────────────────────────────────────
Cohort-interacted sharp dynamic specification
──────────────────────────────────────────────────────────────────────
Number of cohorts: 3 Interactions within cohorts: 0
Relative time periods: 5 Excluded periods: -1
──────────────────────────────────────────────────────────────────────
Fixed effects: fe_hhidpn fe_wave
──────────────────────────────────────────────────────────────────────
Converged: true Singletons dropped: 0
──────────────────────────────────────────────────────────────────────
a = agg(r, :rel)
───────────────────────────────────────────────────────────────────
Estimate Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────
rel: -3 591.046 1273.08 0.46 0.6425 -1905.3 3087.39
rel: -2 352.639 697.78 0.51 0.6133 -1015.62 1720.9
rel: 0 2960.04 540.989 5.47 <1e-07 1899.23 4020.86
rel: 1 529.767 586.831 0.90 0.3667 -620.935 1680.47
rel: 2 800.106 1010.81 0.79 0.4287 -1181.97 2782.18
───────────────────────────────────────────────────────────────────
Regression Discontinuity
using RegressionDiscontinuity
data = RDData(RegressionDiscontinuity.Lee08())
RDData{Vector{Float64}, RunningVariable{Float64, Float64, Vector{Float64}}}([0.0, 0.0, 0.0, 0.251144230365753, 0.23875193297862998, 0.267181247472763, 0.262862056493759, 0.0, 0.34071296453476, 0.6447104215621949 … 0.467111736536026, 0.712594926357269, 0.718958139419556, 0.791856050491333, 0.8541748523712159, 0.677568554878235, 0.808446884155273, 0.505452692508698, 1.0, 0.760513544082642], [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0 … 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])
fit(NaiveLocalLinearRD(kernel=Rectangular(), bandwidth=ImbensKalyanaraman()), data.ZsR, data.Ys)
Local linear regression for regression discontinuity design
⋅⋅⋅⋅ Naive inference (not accounting for bias)
⋅⋅⋅⋅ Rectangular kernel (U[-0.5,0.5])
⋅⋅⋅⋅ Imbens Kalyanaraman bandwidth
⋅⋅⋅⋅ Eicker White Huber variance
────────────────────────────────────────────────────────────────────────────────────────────────
h τ̂ se bias z p-val Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────────────────────
Sharp RD estimand 0.462024 0.08077 0.0087317 unaccounted 9.25 <1e-99 0.0636562 0.0978838
────────────────────────────────────────────────────────────────────────────────────────────────
Synthetic Control
using SynthControl, Dates
df = load_brexit()
897×3 DataFrame872 rows omitted
String
Date
Float64
1
Australia
2009-01-01
1.04
2
Austria
2009-01-01
-1.53
3
Belgium
2009-01-01
-1.15
4
Canada
2009-01-01
-2.28
5
Denmark
2009-01-01
-1.42
6
Finland
2009-01-01
-6.8
7
France
2009-01-01
-1.67
8
Germany
2009-01-01
-4.49
9
Greece
2009-01-01
-4.74
10
Iceland
2009-01-01
-7.61
11
Ireland
2009-01-01
-0.34
12
Italy
2009-01-01
-2.75
13
Japan
2009-01-01
-4.88
⋮
⋮
⋮
⋮
886
Italy
2018-07-01
-0.82
887
Japan
2018-07-01
9.54
888
Luxembourg
2018-07-01
28.41
889
Netherlands
2018-07-01
10.69
890
New Zealand
2018-07-01
24.06
891
Norway
2018-07-01
12.82
892
Portugal
2018-07-01
2.23
893
Spain
2018-07-01
5.74
894
Sweden
2018-07-01
22.48
895
Switzerland
2018-07-01
14.35
896
United Kingdom
2018-07-01
15.72
897
United States
2018-07-01
19.32
bp = BalancedPanel(df, "United Kingdom" => Date(2016, 7, 1); id_var = :country, t_var = :quarter, outcome_var = :realgdp)
Balanced Panel - single treated unit, continuous treatment
Treated unit: United Kingdom
Number of untreated units: 22
First treatment period: 2016-07-01
Number of pretreatment periods: 30
Number of treatment periods: 9
s = SimpleSCM(bp)
Synthetic Control Model
Treatment panel:
Model is not fitted
Balanced Panel - single treated unit, continuous treatment
Treated unit: United Kingdom
Number of untreated units: 22
First treatment period: 2016-07-01
Number of pretreatment periods: 30
Number of treatment periods: 9
fit!(s)
Synthetic Control Model
Treatment panel:
Model is fitted
Impact estimates: [-0.54, -0.31, -0.206, -0.732, -1.241, -1.482, -1.818, -2.327, -1.994]
Balanced Panel - single treated unit, continuous treatment
Treated unit: United Kingdom
Number of untreated units: 22
First treatment period: 2016-07-01
Number of pretreatment periods: 30
Number of treatment periods: 9
using Plots
plot(s)
SyntheticDiD
sp = load_smoking_panel()
Balanced Panel - single treated unit, continuous treatment
Treated unit: 3
Number of untreated units: 38
First treatment period: 1989
Number of pretreatment periods: 19
Number of treatment periods: 12
sdid_model = SyntheticDiD(sp)
Synthetic Difference-in-Differences Model
Model is not fitted
fit!(sdid_model)
Synthetic Difference-in-Differences Model
Model is fitted
Impact estimate: -15.604
Last updated