Title: | Extended Two-Way Fixed Effects |
---|---|
Description: | Convenience functions for implementing extended two-way fixed effect regressions a la Wooldridge (2021, 2023) <doi:10.2139/ssrn.3906345>, <doi:10.1093/ectj/utad016>. |
Authors: | Grant McDermott [aut, cre] , Frederic Kluser [ctb] |
Maintainer: | Grant McDermott <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.0 |
Built: | 2025-01-15 06:28:06 UTC |
Source: | https://github.com/grantmcdermott/etwfe |
Companion function to etwfe
, enabling the recovery of aggregate treatment
effects along different dimensions of interest (e.g, an event study of
dynamic average treatment effects). emfx
is a light wrapper around the
slopes
function from the marginaleffects
package.
emfx( object, type = c("simple", "group", "calendar", "event"), by_xvar = "auto", compress = "auto", collapse = compress, predict = c("response", "link"), post_only = TRUE, lean = FALSE, ... )
emfx( object, type = c("simple", "group", "calendar", "event"), by_xvar = "auto", compress = "auto", collapse = compress, predict = c("response", "link"), post_only = TRUE, lean = FALSE, ... )
object |
An |
type |
Character. The desired type of post-estimation aggregation. |
by_xvar |
Logical. Should the results account for heterogeneous
treatment effects? Only relevant if the preceding |
compress |
Logical. Compress the data by (period by cohort) groups
before calculating marginal effects? This trades off a slight loss in
precision (typically around the 1st or 2nd significant decimal point) for a
substantial improvement in estimation time for large datasets. The default
behaviour ( |
collapse |
Logical. An alias for |
predict |
Character. The type (scale) of prediction used to compute the
marginal effects. If |
post_only |
Logical. Drop pre-treatment ATTs? Only evaluated if (a)
|
lean |
Logical. Default is |
... |
Additional arguments passed to
|
A data.frame
of aggregated treatment effects along the
dimension(s) of interested. Note that this data.frame will have been
overloaded with the slopes
class, and so
will come with a special print method. But the underlying columns will
usually include:
term
contrast
<type>
(i.e., the name of your type
string)
estimate
std.error
statistic
p.value
s.value
conf.low
conf.high
Under most situations, etwfe
should complete very quickly. For its part,
emfx
is quite performant too and should take a few seconds or less for
datasets under 100k rows. However, emfx
's computation time does tend to
scale non-linearly with the size of the original data, as well as the
number of interactions from the underlying etwfe
model. Without getting
too deep into the weeds, the numerical delta method used to recover the
ATEs of interest has to estimate two prediction models for each
coefficient in the model and then compute their standard errors. So, it's
a potentially expensive operation that can push the computation time for
large datasets (> 1m rows) up to several minutes or longer.
Fortunately, there are two complementary strategies that you can use to
speed things up. The first is to turn off the most expensive part of the
whole procedure—standard error calculation—by calling emfx(..., vcov = FALSE)
. Doing so should bring the estimation time back down to a few
seconds or less, even for datasets in excess of a million rows. While the
loss of standard errors might not be an acceptable trade-off for projects
where statistical inference is critical, the good news is this first
strategy can still be combined our second strategy. It turns out that
collapsing the data by groups prior to estimating the marginal effects can
yield substantial speed gains of its own. Users can do this by invoking
the emfx(..., collapse = TRUE)
argument. While the effect here is not as
dramatic as the first strategy, our second strategy does have the virtue
of retaining information about the standard errors. The trade-off this
time, however, is that collapsing our data does lead to a loss in accuracy
for our estimated parameters. On the other hand, testing suggests that
this loss in accuracy tends to be relatively minor, with results
equivalent up to the 1st or 2nd significant decimal place (or even
better).
Summarizing, here's a quick plan of attack for you to try if you are worried about the estimation time for large datasets and models:
Estimate mod = etwfe(...)
as per usual.
Run emfx(mod, vcov = FALSE, ...)
.
Run emfx(mod, vcov = FALSE, collapse = TRUE, ...)
.
Compare the point estimates from steps 1 and 2. If they are are similar
enough to your satisfaction, get the approximate standard errors by
running emfx(mod, collapse = TRUE, ...)
.
Specifying etwfe(..., xvar = <xvar>)
will generate interaction effects
for all levels of <xvar>
as part of the main regression model. The
reason that this is useful (as opposed to a regular, non-interacted
covariate in the formula RHS) is that it allows us to estimate
heterogeneous treatment effects as part of the larger ETWFE framework.
Specifically, we can recover heterogeneous treatment effects for each
level of <xvar>
by passing the resulting etwfe
model object on to
emfx()
.
For example, imagine that we have a categorical variable called "age" in
our dataset, with two distinct levels "adult" and "child". Running
emfx(etwfe(..., xvar = age))
will tell us how the efficacy of treatment
varies across adults and children. We can then also leverage the in-built
hypothesis testing infrastructure of marginaleffects
to test whether
the treatment effect is statistically different across these two age
groups; see Examples below. Note the same principles carry over to
categorical variables with multiple levels, or even continuous variables
(although continuous variables are not as well supported yet).
Wong, Jeffrey et al. (2021). You Only Compress Once: Optimal Data Compression for Estimating Linear Models. Working paper (version: March 16, 2021). Available: https://doi.org/10.48550/arXiv.2102.11297
marginaleffects::slopes which does the heavily lifting behind the
scenes. etwfe
is the companion estimating function that should be run
before emfx
.
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)
Estimates an "extended" two-way fixed effects regression, with fully
saturated interaction effects a la Wooldridge (2021, 2023). At its heart,
etwfe
is a convenience function that automates a number of tedious and
error prone preparation steps involving both the data and model formulae.
Computation is passed on to the feols
(linear) /
feglm
(nonlinear) functions from the fixest
package. etwfe
should be paired with its companion emfx
function.
etwfe( fml = NULL, tvar = NULL, gvar = NULL, data = NULL, ivar = NULL, xvar = NULL, tref = NULL, gref = NULL, cgroup = c("notyet", "never"), fe = c("vs", "feo", "none"), family = NULL, ... )
etwfe( fml = NULL, tvar = NULL, gvar = NULL, data = NULL, ivar = NULL, xvar = NULL, tref = NULL, gref = NULL, cgroup = c("notyet", "never"), fe = c("vs", "feo", "none"), family = NULL, ... )
fml |
A two-side formula representing the outcome (lhs) and any control
variables (rhs), e.g. |
tvar |
Time variable. Can be a string (e.g., "year") or an expression (e.g., year). |
gvar |
Group variable. Can be either a string (e.g., "first_treated") or an expression (e.g., first_treated). In a staggered treatment setting, the group variable typically denotes treatment cohort. |
data |
The data frame that you want to run ETWFE on. |
ivar |
Optional index variable. Can be a string (e.g., "country") or an
expression (e.g., country). Leaving as NULL (the default) will result in
group-level fixed effects being used, which is more efficient and
necessary for nonlinear models (see |
xvar |
Optional interacted categorical covariate for estimating
heterogeneous treatment effects. Enables recovery of the marginal
treatment effect for distinct levels of |
tref |
Optional reference value for |
gref |
Optional reference value for |
cgroup |
What control group do you wish to use for estimating treatment effects. Either "notyet" treated (the default) or "never" treated. |
fe |
What level of fixed effects should be used? Defaults to "vs" (varying slopes), which is the most efficient in terms of estimation and terseness of the return model object. The other two options, "feo" (fixed effects only) and "none" (no fixed effects whatsoever), trade off efficiency for additional information on other (nuisance) model parameters. Note that the primary treatment parameters of interest should remain unchanged regardless of choice. |
family |
Which |
... |
Additional arguments passed to |
A fixest
object with fully saturated
interaction effects, and a few additional attributes used for
post-estimation in emfx
.
Specifying etwfe(..., xvar = <xvar>)
will generate interaction effects
for all levels of <xvar>
as part of the main regression model. The
reason that this is useful (as opposed to a regular, non-interacted
covariate in the formula RHS) is that it allows us to estimate
heterogeneous treatment effects as part of the larger ETWFE framework.
Specifically, we can recover heterogeneous treatment effects for each
level of <xvar>
by passing the resulting etwfe
model object on to
emfx()
.
For example, imagine that we have a categorical variable called "age" in
our dataset, with two distinct levels "adult" and "child". Running
emfx(etwfe(..., xvar = age))
will tell us how the efficacy of treatment
varies across adults and children. We can then also leverage the in-built
hypothesis testing infrastructure of marginaleffects
to test whether
the treatment effect is statistically different across these two age
groups; see Examples below. Note the same principles carry over to
categorical variables with multiple levels, or even continuous variables
(although continuous variables are not as well supported yet).
Under most situations, etwfe
should complete very quickly. For its part,
emfx
is quite performant too and should take a few seconds or less for
datasets under 100k rows. However, emfx
's computation time does tend to
scale non-linearly with the size of the original data, as well as the
number of interactions from the underlying etwfe
model. Without getting
too deep into the weeds, the numerical delta method used to recover the
ATEs of interest has to estimate two prediction models for each
coefficient in the model and then compute their standard errors. So, it's
a potentially expensive operation that can push the computation time for
large datasets (> 1m rows) up to several minutes or longer.
Fortunately, there are two complementary strategies that you can use to
speed things up. The first is to turn off the most expensive part of the
whole procedure—standard error calculation—by calling emfx(..., vcov = FALSE)
. Doing so should bring the estimation time back down to a few
seconds or less, even for datasets in excess of a million rows. While the
loss of standard errors might not be an acceptable trade-off for projects
where statistical inference is critical, the good news is this first
strategy can still be combined our second strategy. It turns out that
collapsing the data by groups prior to estimating the marginal effects can
yield substantial speed gains of its own. Users can do this by invoking
the emfx(..., collapse = TRUE)
argument. While the effect here is not as
dramatic as the first strategy, our second strategy does have the virtue
of retaining information about the standard errors. The trade-off this
time, however, is that collapsing our data does lead to a loss in accuracy
for our estimated parameters. On the other hand, testing suggests that
this loss in accuracy tends to be relatively minor, with results
equivalent up to the 1st or 2nd significant decimal place (or even
better).
Summarizing, here's a quick plan of attack for you to try if you are worried about the estimation time for large datasets and models:
Estimate mod = etwfe(...)
as per usual.
Run emfx(mod, vcov = FALSE, ...)
.
Run emfx(mod, vcov = FALSE, collapse = TRUE, ...)
.
Compare the point estimates from steps 1 and 2. If they are are similar
enough to your satisfaction, get the approximate standard errors by
running emfx(mod, collapse = TRUE, ...)
.
Wooldridge, Jeffrey M. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. Working paper (version: August 16, 2021). Available: http://dx.doi.org/10.2139/ssrn.3906345
Wooldridge, Jeffrey M. (2023). Simple Approaches to Nonlinear Difference-in-Differences with Panel Data. The Econometrics Journal, 26(3), C31-C66. Available: https://doi.org/10.1093/ectj/utad016
fixest::feols()
, fixest::feglm()
which power the underlying
estimation routines. emfx
is a companion function that handles
post-estimation aggregation to extract quantities of interest.
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)
Visualize the results of an emfx
call.
## S3 method for class 'emfx' plot( x, type = c("pointrange", "errorbar", "ribbon"), pch = 16, zero = TRUE, grid = TRUE, ref = -1, ... )
## S3 method for class 'emfx' plot( x, type = c("pointrange", "errorbar", "ribbon"), pch = 16, zero = TRUE, grid = TRUE, ref = -1, ... )
x |
An |
type |
Character. The type of plot display. One of |
pch |
Integer or character. Which plotting character or symbol to use
(see |
zero |
Logical. Should 0-zero line be emphasized? Default is |
grid |
Logical. Should a background grid be displayed? Default is
|
ref |
Integer. Reference line marker for event-study plot. Default is
|
... |
Additional arguments passed to |
No return value, called for side effect of producing a plot.
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)
## Not run: # We’ll use the mpdta dataset from the did package (which you’ll need to # install separately). # install.packages("did") data("mpdta", package = "did") # # Basic example # # The basic ETWFE workflow involves two consecutive function calls: # 1) `etwfe` and 2) `emfx` # 1) `etwfe`: Estimate a regression model with saturated interaction terms. mod = etwfe( fml = lemp ~ lpop, # outcome ~ controls (use 0 or 1 if none) tvar = year, # time variable gvar = first.treat, # group variable data = mpdta, # dataset vcov = ~countyreal # vcov adjustment (here: clustered by county) ) # mod ## A fixest model object with fully saturated interaction effects. # 2) `emfx`: Recover the treatment effects of interest. (mod_es = emfx(mod, type = "event")) # dynamic ATE a la an event study # Etc. Other aggregation type options are "simple" (the default), "group" # and "calendar" # To visualize results, use the native plot method (see `?plot.emfx`) plot(mod_es) # Notice that we don't get any pre-treatment effects with the default # "notyet" treated control group. Switch to the "never" treated control # group if you want this. etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, cgroup = "never" ## <= use never treated group as control ) |> emfx("event") |> plot() # # Heterogeneous treatment effects # # Example where we estimate heterogeneous treatment effects for counties # within the 8 US Great Lake states (versus all other counties). gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27, "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55) mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls hmod = etwfe( lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, xvar = gls ## <= het. TEs by gls ) # Heterogeneous ATEs (could also specify "event", etc.) emfx(hmod) # To test whether the ATEs across these two groups (non-GLS vs GLS) are # statistically different, simply pass an appropriate "hypothesis" argument. emfx(hmod, hypothesis = "b1 = b2") plot(emfx(hmod)) # # Nonlinear model (distribution / link) families # # Poisson example mpdta$emp = exp(mpdta$lemp) etwfe( emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal, family = "poisson" ## <= family arg for nonlinear options ) |> emfx("event") ## End(Not run)