### Main Assumptions

**Staggered Treatment Adoption Assumption** Recall that \(D_{it} = 1\) if a unit \(i\) has been treated by time \(t\) and \(D_{it}=0\) otherwise. Then, for \(t=1,...,\mathcal{T}-1\), \(D_{it} = 1 \implies D_{it+1} = 1\).

Staggered treatment adoption implies that once a unit participates in the treatment, they remain treated. In other words, units do not “forget” about their treatment experience. This is a leading case in many applications in economics. For example, it would be the case for policies that roll out to different locations over some period of time. It would also be the case for many unit-level treatments that have a “scarring” effect. For example, in the context of job training, many applications consider participating in the treatment *ever* as defining treatment.

Within the DiD context, we believe it is hard to analyze non-staggered treatment setups **without** further restricting treatment effect heterogeneity across time, groups, treatment sequences, etc. That is the main reason we focus on this leading case.

**Parallel Trends Assumption based on never-treated units** For all \(g=2,...,\mathcal{T}\), \(t=2,...,\mathcal{T}\) with \(t \ge g\), \[
E[ Y_t(0) - Y_{t-1}(0) | G=g] = E[ Y_t(0) - Y_{t-1}(0)| C=1]
\]

This is a natural extension of the parallel trends assumption in the two periods and two groups case. It says that, in the absence of treatment, average untreated potential outcomes for the group first treated in time \(g\) and for the “never treated” group would have followed parallel paths in all post-treatment periods \(t \ge g\).

Note that the aforementioned parallel trend assumption rely on using the ``never treated’’ units as comparison group for all “eventually treated” groups. This presumes that (i) a (large enough) “never-treated” group is available in the data, and (ii) these units are “similar enough” to the eventually treated units such that they can indeed be used as a valid comparison group. In situations where these conditions are not satisfied, one can use an alternative parallel trends assumption that uses the **not-yet treated** units as valid comparison groups.

**Parallel Trends Assumption based on not-yet treated units** For all \(g=2,...,\mathcal{T}\), \(s,t=2,...,\mathcal{T}\) with \(t \ge g\) and \(s \ge t\) \[
E[ Y_t(0) - Y_{t-1}(0) | G=g] = E[ Y_t(0) - Y_{t-1}(0)| D_s=0, G\not=g]
\] In plain English, this assumption states that one can use the not-yet-treated by time \(s\) (\(s \ge t\)) units as valid comparison groups when computing the average treatment effect for the group first treated in time \(g\). In general, this assumption uses more data when constructing comparison groups. However, as noted in Marcus and Sant’Anna (2020), this assumption does restrict some pre-treatment trends across different groups. In other words, there is no free-lunch.