In Bayesian statistics, the gamma distribution is the conjugate prior
distribution for a Poisson likelihood. ‘*Conjugate*’ means that
the posterior distribution will follow the same general form as the
prior distribution. DuMouchel (1999) used a model with a Poisson(\(\mu_{ij}\)) likelihood for the counts (for
row *i* and column *j* of the contingency table). We are
interested in the ratio \(\lambda_{ij}=\frac{\mu_{ij}}{E_{ij}}\),
where \(E_{ij}\) are the expected
counts. The \(\lambda_{ij}\)s are
considered random draws from a mixture of two gamma distributions (our
prior) with hyperparameter \(\theta=(\alpha_1,\beta_1,\alpha_2,\beta_2,P)\),
where \(P\) is the prior probability
that \(\lambda\) came from the first
component of the prior mixture (i.e., the mixture fraction). The prior
is a single distribution that models all the cells in the table;
however, there is a separate posterior distribution for each cell in the
table. The posterior distribution of \(\lambda\), given count \(N=n\), is a mixture of two gamma
distributions with parameters \(\theta=(\alpha_1+n,\beta_1+E,\alpha_2+n,\beta_2+E,Q_n)\)
(subscripts suppressed for clarity), where \(Q_n\) is the probability that \(\lambda\) came from the first component of
the posterior, given \(N=n\) (i.e., the
mixture fraction).

The posterior distribution, in a sense, is a Bayesian representation
of the relative reporting ratio, \(RR\)
(note the similarity in the equations \(RR_{ij}=\frac{N_{ij}}{E_{ij}}\) and \(\lambda_{ij}=\frac{\mu_{ij}}{E_{ij}}\)).
The Empirical Bayes (EB) metrics are taken from the posterior
distribution. The Empirical Bayes Geometric Mean \((EBGM)\) is the antilog of the mean of the
log_{2}-transformed posterior distribution. The \(EBGM\) is therefore a measure of central
tendency of the posterior distribution. The 5% and 95% quantiles of the
posterior distributions can be used to create two-sided 90% credibility
intervals for \(\lambda_{ij}\), given
\(N_{ij}\) (i.e, our “sort of” RR).
Alternatively, since we are most interested in the lower bound, we could
ignore the upper bound and create a one-sided 95% credibility
interval.

Due to Bayesian shrinkage (please see the **Background**
section of the *Introduction to openEBGM* vignette), the EB
scores are much more stable than \(RR\)
for small counts.

Once the product/event combinations have been counted and the hyperparameters have been estimated, we can calculate the EB scores:

```
library(openEBGM)
data(caers) #subset of publicly available CAERS data
processed <- processRaw(caers, stratify = FALSE, zeroes = FALSE)
squashed <- squashData(processed)
squashed2 <- squashData(squashed, count = 2, bin_size = 10, keep_pts = 50)
theta_init <- data.frame(alpha1 = c(0.2, 0.1, 0.3, 0.5, 0.2),
beta1 = c(0.1, 0.1, 0.5, 0.3, 0.2),
alpha2 = c(2, 10, 6, 12, 5),
beta2 = c(4, 10, 6, 12, 5),
p = c(1/3, 0.2, 0.5, 0.8, 0.4)
)
hyper_estimates <- autoHyper(squashed2, theta_init = theta_init)
(theta_hat <- hyper_estimates$estimates)
#> alpha1 beta1 alpha2 beta2 P
#> 3.25379356 0.39988854 2.02612692 1.90807970 0.06534557
```

`Qn()`

The `Qn()`

function calculates the mixture fractions for
the posterior distributions. The values returned by `Qn()`

correspond to the probability that \(\lambda\) came from the first component of
the posterior mixture distribution, given \(N=n\) (recall there is a \(\lambda|N=n\) for each cell in the table,
but that each \(\lambda\) comes from a
common distribution). Thus, the output from `Qn()`

returns a
numeric vector of length equal to the total number of product-symptom
combinations, which is also the number of rows in the data frame
returned by `processRaw()`

. When calculating the \(Q_n\)s, be sure to use the full data set
from `processRaw()`

– not the squashed data set or the raw
data.

`ebgm()`

The `ebgm()`

function calculates the Empirical Bayes
Geometric Mean \((EBGM)\) scores. \(EBGM\) is a measure of central tendency of
the posterior distributions, \(\lambda_{ij}|N=n\). Scores much larger than
one indicate product/adverse event pairs that are reported at an
unusually high rate.

```
processed$ebgm <- ebgm(theta_hat, N = processed$N, E = processed$E, qn = qn)
head(processed)
#> var1 var2 N E RR PRR
#> 1 1-PHENYLALANINE HEART RATE INCREASED 1 0.0360548272 27.74 27.96
#> 2 11 UNSPECIFIED VITAMINS ASTHMA 1 0.0038736591 258.15 279.58
#> 3 11 UNSPECIFIED VITAMINS CARDIAC FUNCTION TEST 1 0.0002979738 3356.00 Inf
#> 4 11 UNSPECIFIED VITAMINS CHEST PAIN 1 0.0360548272 27.74 27.96
#> 5 11 UNSPECIFIED VITAMINS DYSPNOEA 1 0.0765792610 13.06 13.11
#> 6 11 UNSPECIFIED VITAMINS HYPERSENSITIVITY 1 0.0527413588 18.96 19.06
#> ebgm
#> 1 2.23
#> 2 2.58
#> 3 2.63
#> 4 2.23
#> 5 1.92
#> 6 2.09
```

`quantBisect()`

The `quantBisect()`

function calculates quantiles of the
posterior distribution using the bisection method.
`quantBisect()`

can calculate any quantile of the posterior
distribution between 1 and 99%, and these quantiles can be used as
limits for credibility intervals. Below, *QUANT_05* is the
5^{th} percentile; *QUANT_95* is the 95^{th}
percentile. These form the lower and upper bounds of 90% credibility
intervals for the Empirical Bayes (EB) scores.

```
processed$QUANT_05 <- quantBisect(5, theta_hat = theta_hat,
N = processed$N, E = processed$E, qn = qn)
processed$QUANT_95 <- quantBisect(95, theta_hat = theta_hat,
N = processed$N, E = processed$E, qn = qn)
head(processed)
#> var1 var2 N E RR PRR
#> 1 1-PHENYLALANINE HEART RATE INCREASED 1 0.0360548272 27.74 27.96
#> 2 11 UNSPECIFIED VITAMINS ASTHMA 1 0.0038736591 258.15 279.58
#> 3 11 UNSPECIFIED VITAMINS CARDIAC FUNCTION TEST 1 0.0002979738 3356.00 Inf
#> 4 11 UNSPECIFIED VITAMINS CHEST PAIN 1 0.0360548272 27.74 27.96
#> 5 11 UNSPECIFIED VITAMINS DYSPNOEA 1 0.0765792610 13.06 13.11
#> 6 11 UNSPECIFIED VITAMINS HYPERSENSITIVITY 1 0.0527413588 18.96 19.06
#> ebgm QUANT_05 QUANT_95
#> 1 2.23 0.49 13.85
#> 2 2.58 0.52 15.78
#> 3 2.63 0.52 16.02
#> 4 2.23 0.49 13.85
#> 5 1.92 0.47 11.77
#> 6 2.09 0.48 12.95
```

The EB-scores (\(EBGM\) and quantile
scores) can be used to look for “signals” in the data. As stated in the
**Background** section of the *Introduction to
openEBGM* vignette, Bayesian shrinkage causes the EB-scores to be
far more stable than their \(RR\)
counterparts, which allows for better separation between signal and
noise. One could, for example, look at all product-symptom combinations
where *QUANT_05* (the lower part of the 90% two-sided credibility
interval) is 2 or greater. This is often used as a conservative
alternative to \(EBGM\) since
*QUANT_05* scores are naturally smaller than \(EBGM\) scores. We can say with high
confidence that the “true relative reporting ratios” of product/adverse
event combinations above this threshold are much greater than 1, so
those combinations are truly reported more than expected. The value of 2
is arbitrarily chosen, and depends on the context. Below is an example
of how one may identify product-symptom combinations that require
further investigation based on the EB-scores.

```
suspicious <- processed[processed$QUANT_05 >= 2, ]
nrow(suspicious); nrow(processed); nrow(suspicious)/nrow(processed)
#> [1] 131
#> [1] 17189
#> [1] 0.007621153
```

From above we see that less than 1% of product-symptom pairs are
suspect based on the *QUANT_05* score. One may look more closely
at these product-symptom combinations to ascertain which products may
need further investigation. Subject matter knowledge is required to
determine which signals might identify a possible causal relationship.
The EB-scores find statistical associations – not necessarily causal
relationships.

```
suspicious <- suspicious[order(suspicious$QUANT_05, decreasing = TRUE),
c("var1", "var2", "N", "E", "QUANT_05", "ebgm",
"QUANT_95")]
head(suspicious, 5)
#> var1 var2 N
#> 13924 REUMOFAN PLUS WEIGHT INCREASED 16
#> 8187 HYDROXYCUT REGULAR RAPID RELEASE CAPLETS EMOTIONAL DISTRESS 19
#> 13886 REUMOFAN PLUS IMMOBILE 6
#> 7793 HYDROXYCUT HARDCORE CAPSULES CARDIO-RESPIRATORY DISTRESS 8
#> 8220 HYDROXYCUT REGULAR RAPID RELEASE CAPLETS INJURY 11
#> E QUANT_05 ebgm QUANT_95
#> 13924 0.40643623 15.68 23.26 33.48
#> 8187 0.89690107 11.65 16.78 23.55
#> 13886 0.07866508 10.16 18.28 30.83
#> 7793 0.30482718 9.00 15.25 24.52
#> 8220 0.56317044 8.98 14.28 21.78
tabbed <- table(suspicious$var1)
head(tabbed[order(tabbed, decreasing = TRUE)])
#>
#> HYDROXYCUT REGULAR RAPID RELEASE CAPLETS
#> 26
#> HYDROXYCUT HARDCORE CAPSULES
#> 13
#> REUMOFAN PLUS
#> 8
#> HYDROXYCUT CAPSULES
#> 5
#> HYDROXYCUT MAX LIQUID CAPS
#> 5
#> HYDROXYCUT CAFFEINE FREE CAPLETS
#> 4
```

The output above suggests some products which may require further investigation.

Next, the *openEBGM Objects and Class Functions* vignette will
demonstrate the object-oriented features of the *openEBGM*
package.