This vignette shows how to use healthcare.antitrust to implement the semiparametric demand estimation technique developed in Raval, Rosenbaum, and Tenn (2017) “A Semiparametric Discrete Choice Model: An Application to Hospital Mergers” https://doi.org/10.1111/ecin.12454. This model is useful to estimate “diversion ratios” which are used to assess substitutability between merging parties’ products in antitrust analysis.

The method is illustrated using a simulated dataset of hospital discharges that is included with the healthcare.antitrust package. This simulated dataset mimics the basic structure of real hospital discharge data, with each observation including a hospital identifier and patient characteristics (DRG diagnosis code, patient age, and patient zip code of residence). The data includes 1,200 discharges from eight hospitals.

Suppose that the analyst is interested in estimating diversion ratios between hospitals in System 1, which includes Hospitals 1 and 2, and Hospital 5, which is an independent hospital. This vignette will show how to implement each step to arrive at the estimate.

The first step is to assign observations to ‘cells,’ within which the
substitution proportional to share will be assumed to hold. This
assignment of observations can be done with the `cell_defn`

function. Begin by loading the example simulation dataset of hospital
discharges:

```
library(healthcare.antitrust)
data(discharge_data, package = "healthcare.antitrust")
```

The observations will be grouped into cells based on variables in the
dataset. Let’s define cells based on three variables: the DRG diagnosis
code, the patient age, and the patient zip code of residence. We do this
by specifying the variables and then saving them in a list variable
called `layers`

.

```
<- c("drg","age","zip5")
list1 <- list(list1) layers
```

A minimum cell size threshold also needs to be set. Observations will be assigned to a cell only if that cell includes at least the minimum threshold number of observations. Let’s set the minimum size to 15:

`<- 15 th `

Then use `cell_defn`

to allocate observations to cells,
based on the variables in the layer. The one variable that needs to be
included in the dataframe is a variable specifying the number of
admissions represented by each observation. By default, this variable
name is `count`

, and if present the function
`cell_defn`

will use it as the admission count.
Alternatively, another variable can be indicated as the count variable.
If no variable is specified as the count and `count`

is not
present in the dataframe, `cell_defn`

will assume each
observation represents one admission and create a variable
`count`

always equal to one. We will go ahead and create the
variable in our dataframe, and then run `cell_defn`

.

```
$count <- 1
discharge_data<- cell_defn(discharge_data,th,layers)
outList #> Layer 1: 1116 obs allocated
#> Number of Excluded Obs: 84
```

The `cell_defn`

function prints a message indicating the
number of observations assigned to cells, and also returns a list. The
first item in the return list is a new dataset that contains the
observations that have been allocated to a cell, as well as their
assignment.

```
<- outList$assigned
D0 print(nrow(D0))
#> [1] 1116
print(nrow(discharge_data))
#> [1] 1200
```

The new dataset has `nrow(D0)`

observations.
`cell_defn`

stated that it excluded some observations, those
that were not allocated to a cell that met the minimum size threshold.
The number of unassigned observations added to the number of assigned
observations in the new dataset equals the number of observations in the
original dataset. A data frame of unassigned observations is the second
item in the return list and can be accessed with
`outList$unassigned`

.

Because some observations may not be assigned to a cell, it is useful
to specify several “layers” of variables in order to assign observations
to cells. Using a more coarse set of variables will assign more
observations, since the cells will be larger and more likely to exceed
the minimum size threshold. But finer layers allow one to better account
for heterogeneity in preferences across different types of people. That
is why `cell_defn`

allows for input of many layers, and they
should be ordered by increasing coarseness. This will assign
observations by more granular categories at first, and then for
remaining unassigned individuals, they will be assigned into more coarse
groupings.

Instead of using only one layer, we can modify our example to instead use three layers to define cells. Adding more coarse layers will allow more observations to be allocated to cells that meet the minimum size threshold. Let’s add layers that use three digit zip codes instead of the five digit zip code.

```
$zip3 <- floor(discharge_data$zip5/100)
discharge_data<- c("drg","age","zip5")
list1 <- c("drg","age","zip3")
list2 <- c("zip3")
list3 <- list(list1, list2, list3) layers
```

With this new definition of layers, we can allocate observations like before:

```
<- cell_defn(discharge_data,th,layers)
outList #> Layer 1: 1116 obs allocated
#> Layer 2: 0 obs allocated
#> Layer 3: 84 obs allocated
#> Number of Excluded Obs: 0
<- outList$assigned
D0 print(nrow(D0))
#> [1] 1200
print(nrow(discharge_data))
#> [1] 1200
```

The observations are first allocated to cells as before, and the remaining observations are allocated to a cell based on three digit zip code only. With these definitions, every observation is allocated to a cell that meets the minimum size threshold that was set.

Note that in each layer, `cell_defn`

only assigns
observations that have not been assigned in previous layers. That is,
the function assigns observations without replacement. An alternative
method can be used, in which assignment at each layer is done by
sampling with replacement from all observation that meet the cell
criteria, even if they have been previously assigned in a finer layer.
This option of assigning with replacement is currently not
implemented.

We are now able to calculate diversion ratios based on the cell definitions.

Once observations have been assigned to cells, we can now calculate
diversion ratios with the function `div_calc`

. Within each
cell \(c\), the diversion from a
hospital \(k\) to hospital \(h\) is given by \(d^c_{k,h} = \frac{s^c_h}{1 - s^c_{m(k)}}\)
where \(s^c_h\) is the share of cell
\(c\) admissions to hospital \(h\), and \(s^c_{m(k)}\) is the share of cell \(c\) admissions at hospital \(k\) or any other hospital in the same
system as \(k\). The overall diversions
are then calculated by aggregating over cells (See Raval, Rosenbaum, and
Tenn (2017) for details.)

The required variables in the input dataframe are `cell`

,
which has been defined by the `cell_defn`

function, as well
as hospital and system identifiers: `hosp_id`

,
`Hospital`

, `sys_id`

, and `System`

. The
function also requires `count`

, which is created by
`cell_defn`

if it does not exist.

Finally, we need to specify an indicator for the set of hospitals we
are interested in diversions among, for example the hospitals that are
parties in a proposed merger. The variable to indicate such hospitals is
`party_ind`

, and these are the hospitals for which diversion
ratios will be calculated. We are interested in studying a proposed
merger including hospitals 1, 2, and 5, and so we assign them the party
indicator variable.

```
$party_ind <- 0
D0$party_ind[D0$hosp_id==1] <- 1
D0$party_ind[D0$hosp_id==2] <- 1
D0$party_ind[D0$hosp_id==5] <- 1 D0
```

Now we can calculate diversions using the `div_calc`

function.

`<- div_calc(D0) out `

The function `div_calc`

returns a list with two
components, the first is a matrix of hospital-level diversions and the
second is a matrix aggregated to system-level diversions. The
hospital-level matrix gives diversions from each of the merging
hospitals to all other hospitals. The diversions should sum to 1. We can
verify that our diversions sum to one for Hospital 1:

```
<- out$hosp_level
divratio_hosp sum(divratio_hosp$div_from_1, na.rm = TRUE)
#> [1] 1
```

We can also print the hospital-level and system-level diversions.

```
print(out$hosp_level)
#> hosp_id hospital party_sys_id sys_id N_h div_from_1 div_from_2 div_from_5
#> 1 1 Hospital 1 1 1 126 NA NA 0.020029200
#> 2 2 Hospital 2 1 1 62 NA NA 0.108043702
#> 5 5 Hospital 5 5 5 281 0.030760752 0.21885190 NA
#> 3 3 Hospital 3 NA 3 68 0.011270687 0.06452584 0.405450090
#> 4 4 Hospital 4 NA 3 67 0.350897381 0.05426687 0.003693519
#> 6 6 Hospital 6 NA 6 198 0.049305381 0.17531098 0.167909726
#> 7 7 Hospital 7 NA 7 107 0.008281941 0.17351964 0.034753384
#> 8 8 Hospital 8 NA 8 291 0.549483858 0.31352477 0.260120379
print(out$sys_level)
#> hosp_id hospital party_sys_id sys_id N_h div_from_sys_1 div_from_sys_5
#> 1 1 Hospital 1 1 1 126 NA 0.020029200
#> 2 2 Hospital 2 1 1 62 NA 0.108043702
#> 5 5 Hospital 5 5 5 281 0.09279081 NA
#> 3 3 Hospital 3 NA 3 68 0.02883356 0.405450090
#> 4 4 Hospital 4 NA 3 67 0.25307242 0.003693519
#> 6 6 Hospital 6 NA 6 198 0.09086042 0.167909726
#> 7 7 Hospital 7 NA 7 107 0.06277523 0.034753384
#> 8 8 Hospital 8 NA 8 291 0.47166756 0.260120379
```

If the total diversion for a given hospital does not sum to 1, that means that at least one degenerate cell exists, meaning that there exists a cell where every individual assigned to that cell visits the same hospital. Such cells cause problems for the method used here, which predicts diversions for individuals in a cell based on the shares within that cell. If the share is 100% in a cell, the model cannot make any prediction about to where those individuals would substitute.

By default, `div_calc`

will drop any degenerate cells so
that diversions will still sum to 1. If a degenerate cell exists and the
`dropDegenerateCell`

option is set to `FALSE`

, the
diversion for a hospital may sum to less than one because degenerate
cells will not be dropped. A notification flag will be printed whenever
a degenerate cell exists for a given hospital.

Once observations have been assigned to cells, one can also use the
function `wtp_calc`

to calculate the willingness-to-pay (WTP)
of a hospital system (see Capps, Dranove, and Satterthwaite (2003)
“Competition and Market Power in Option Demand Markets” https://www.jstor.org/stable/1593786). This can be used
to estimate the change in WTP that would occur after a merger, using the
discharge data as we have been using.

Within this semiparametric estimation method, the formulas for WTP are as follows. For individuals in a given cell \(c\), the choice probabiities for a given hospital \(j\) are estimated by \(s_j^c = \frac{N^c_j}{N^c}\), where \(N^c\) is the number of admissions belonging to cell \(c\), and \(N^c_j\) is the number of admissions from cell \(c\) that visit hospital \(j\). The WTP for the cell is then given by \(WTP_c(j) = \ln \frac{1}{1-s_j^c}\). The population WTP for hospital \(j\) is then simply the weighted sum across cells:

\(WTP(j) = \sum_c N^c \times WTP_c(j)\)

To calculate the change in WTP resulting from a merger, first note the general formula for calcuating WTP for a system \(M\) is:

\(WTP_c(M) = \ln \frac{1}{1- \sum_{j \in M} s_j^c}\)

Aggregating across cells gives the population WTP. The changing in WTP from a merger can then be given as the WTP of the post-merger system minus the WTP of the sum of the systems pre-merger. If we denote by \(M\) the hypothetical post-merger system and \(M_1\) and \(M_2\) the systems before they merge, then as a percentage increase the change in willingness-to-pay is given by \(\Delta WTP = \frac{WTP(M) - (WTP(M_1) + WTP(M_2)) }{WTP(M_1) + WTP(M_2)}\)

To implement this calculation using the `wtp_calc`

function, we calculate the WTP for both systems pre-merger as well as
for a hypothetical combined system.

```
<- wtp_calc(D0)
out <- subset(out, party == 1)
y_pre
<- D0
D0_post $sys_id[D0_post$party_ind == 1] <- 1
D0_post
<- wtp_calc(D0_post)
out <- subset(out, party == 1)
y_post
<- aggregate(list(WTP_s = y_pre$WTP_s, WTP_s_wt = y_pre$WTP_s_wt, N_s=y_pre$N_s),by=list(y_party=y_pre$party),sum)
y_pre
print("% Change in WTP")
#> [1] "% Change in WTP"
print((y_post$WTP_s-y_pre$WTP_s)/(y_pre$WTP_s)*100)
#> [1] 6.504824
```

This standard approach makes the assumption that WTP per dollar is
constant across diagnoses. One alternative approach is to weight each
admission by the diagnoses code (such as DRG) for each admission. This
can be done with `wtp_calc`

by providing a variable
`weight`

. For example, it is sometimes desirable to weight
the WTP calculation by DRG weights. If weight is provided,
`wtp_calc`

will provide both a weighted an unweighted
calculation for WTP. Otherwise, `wtp_calc`

will assume a
weight of 1 and both the unweighted and weighted WTP estimates will be
equivalent.