In Bayesian statistics, the gamma distribution is the conjugate prior distribution for a Poisson likelihood. ‘*Conjugate*’ means that the posterior distribution will follow the same general form as the prior distribution. DuMouchel (1999) used a model with a Poisson(\(\mu_{ij}\)) likelihood for the counts (for row *i* and column *j* of the contingency table). We are interested in the ratio \(\lambda_{ij}=\frac{\mu_{ij}}{E_{ij}}\), where \(E_{ij}\) are the expected counts. The \(\lambda_{ij}\)s are considered random draws from a mixture of two gamma distributions (our prior) with hyperparameter \(\theta=(\alpha_1,\beta_1,\alpha_2,\beta_2,P)\), where \(P\) is the prior probability that \(\lambda\) came from the first component of the prior mixture (i.e., the mixture fraction). The prior is a single distribution that models all the cells in the table; however, there is a separate posterior distribution for each cell in the table. The posterior distribution of \(\lambda\), given count \(N=n\), is a mixture of two gamma distributions with parameters \(\theta=(\alpha_1+n,\beta_1+E,\alpha_2+n,\beta_2+E,Q_n)\) (subscripts suppressed for clarity), where \(Q_n\) is the probability that \(\lambda\) came from the first component of the posterior, given \(N=n\) (i.e., the mixture fraction).

The posterior distribution, in a sense, is a Bayesian representation of the relative reporting ratio, \(RR\) (note the similarity in the equations \(RR_{ij}=\frac{N_{ij}}{E_{ij}}\) and \(\lambda_{ij}=\frac{\mu_{ij}}{E_{ij}}\)). The Empirical Bayes (EB) metrics are taken from the posterior distribution. The Empirical Bayes Geometric Mean \((EBGM)\) is the antilog of the mean of the log_{2}-transformed posterior distribution. The \(EBGM\) is therefore a measure of central tendency of the posterior distribution. The 5% and 95% quantiles of the posterior distributions can be used to create two-sided 90% credibility intervals for \(\lambda_{ij}\), given \(N_{ij}\) (i.e, our “sort of” RR). Alternatively, since we are most interested in the lower bound, we could ignore the upper bound and create a one-sided 95% credibility interval.

Due to Bayesian shrinkage (please see the **Background** section of the *Introduction to openEBGM* vignette), the EB scores are much more stable than \(RR\) for small counts.

Once the product/event combinations have been counted and the hyperparameters have been estimated, we can calculate the EB scores:

```
library(openEBGM)
data(caers) #subset of publicly available CAERS data
processed <- processRaw(caers, stratify = FALSE, zeroes = FALSE)
squashed <- squashData(processed)
squashed2 <- squashData(squashed, count = 2, bin_size = 10, keep_pts = 50)
theta_init <- data.frame(alpha1 = c(0.2, 0.1, 0.3, 0.5, 0.2),
beta1 = c(0.1, 0.1, 0.5, 0.3, 0.2),
alpha2 = c(2, 10, 6, 12, 5),
beta2 = c(4, 10, 6, 12, 5),
p = c(1/3, 0.2, 0.5, 0.8, 0.4))
hyper_estimates <- autoHyper(squashed2, theta_init = theta_init)
(theta_hat <- hyper_estimates$estimates)
#> alpha1 beta1 alpha2 beta2 P
#> 3.25379064 0.39988801 2.02613219 1.90808279 0.06534557
```

`Qn()`

The `Qn()`

function calculates the mixture fractions for the posterior distributions. The values returned by `Qn()`

correspond to the probability that \(\lambda\) came from the first component of the posterior mixture distribution, given \(N=n\) (recall there is a \(\lambda|N=n\) for each cell in the table, but that each \(\lambda\) comes from a common distribution). Thus, the output from `Qn()`

returns a numeric vector of length equal to the total number of product-symptom combinations, which is also the number of rows in the data frame returned by `processRaw()`

. When calculating the \(Q_n\)s, be sure to use the full data set from `processRaw()`

– not the squashed data set or the raw data.

```
qn <- Qn(theta_hat, N = processed$N, E = processed$E)
head(qn)
#> [1] 0.2819736 0.3409652 0.3482316 0.2819736 0.2226354 0.2556668
identical(length(qn), nrow(processed))
#> [1] TRUE
summary(qn)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0001205 0.2340247 0.3089647 0.2846020 0.3397683 0.9999997
```

`ebgm()`

The `ebgm()`

function calculates the Empirical Bayes Geometric Mean \((EBGM)\) scores. \(EBGM\) is a measure of central tendency of the posterior distributions, \(\lambda_{ij}|N=n\). Scores much larger than one indicate product/adverse event pairs that are reported at an unusually high rate.

```
processed$ebgm <- ebgm(theta_hat, N = processed$N, E = processed$E, qn = qn)
head(processed)
#> var1 var2 N E RR
#> 1 1-PHENYLALANINE HEART RATE INCREASED 1 0.0360548272 27.74
#> 2 11 UNSPECIFIED VITAMINS ASTHMA 1 0.0038736591 258.15
#> 3 11 UNSPECIFIED VITAMINS CARDIAC FUNCTION TEST 1 0.0002979738 3356.00
#> 4 11 UNSPECIFIED VITAMINS CHEST PAIN 1 0.0360548272 27.74
#> 5 11 UNSPECIFIED VITAMINS DYSPNOEA 1 0.0765792610 13.06
#> 6 11 UNSPECIFIED VITAMINS HYPERSENSITIVITY 1 0.0527413588 18.96
#> PRR ebgm
#> 1 27.96 2.23
#> 2 279.58 2.58
#> 3 Inf 2.63
#> 4 27.96 2.23
#> 5 13.11 1.92
#> 6 19.06 2.09
```

`quantBisect()`

The `quantBisect()`

function calculates quantiles of the posterior distribution using the bisection method. `quantBisect()`

can calculate any quantile of the posterior distribution between 1 and 99%, and these quantiles can be used as limits for credibility intervals. Below, *QUANT_05* is the 5^{th} percentile; *QUANT_95* is the 95^{th} percentile. These form the lower and upper bounds of 90% credibility intervals for the Empirical Bayes (EB) scores.

```
processed$QUANT_05 <- quantBisect(5, theta_hat = theta_hat,
N = processed$N, E = processed$E, qn = qn)
processed$QUANT_95 <- quantBisect(95, theta_hat = theta_hat,
N = processed$N, E = processed$E, qn = qn)
head(processed)
#> var1 var2 N E RR
#> 1 1-PHENYLALANINE HEART RATE INCREASED 1 0.0360548272 27.74
#> 2 11 UNSPECIFIED VITAMINS ASTHMA 1 0.0038736591 258.15
#> 3 11 UNSPECIFIED VITAMINS CARDIAC FUNCTION TEST 1 0.0002979738 3356.00
#> 4 11 UNSPECIFIED VITAMINS CHEST PAIN 1 0.0360548272 27.74
#> 5 11 UNSPECIFIED VITAMINS DYSPNOEA 1 0.0765792610 13.06
#> 6 11 UNSPECIFIED VITAMINS HYPERSENSITIVITY 1 0.0527413588 18.96
#> PRR ebgm QUANT_05 QUANT_95
#> 1 27.96 2.23 0.49 13.85
#> 2 279.58 2.58 0.52 15.78
#> 3 Inf 2.63 0.52 16.02
#> 4 27.96 2.23 0.49 13.85
#> 5 13.11 1.92 0.47 11.77
#> 6 19.06 2.09 0.48 12.95
```

The EB-scores (\(EBGM\) and quantile scores) can be used to look for “signals” in the data. As stated in the **Background** section of the *Introduction to openEBGM* vignette, Bayesian shrinkage causes the EB-scores to be far more stable than their \(RR\) counterparts, which allows for better separation between signal and noise. One could, for example, look at all product-symptom combinations where *QUANT_05* (the lower part of the 90% two-sided credibility interval) is 2 or greater. This is often used as a conservative alternative to \(EBGM\) since *QUANT_05* scores are naturally smaller than \(EBGM\) scores. We can say with high confidence that the “true relative reporting ratios” of product/adverse event combinations above this threshold are much greater than 1, so those combinations are truly reported more than expected. The value of 2 is arbitrarily chosen, and depends on the context. Below is an example of how one may identify product-symptom combinations that require further investigation based on the EB-scores.

```
suspicious <- processed[processed$QUANT_05 >= 2, ]
nrow(suspicious); nrow(processed); nrow(suspicious)/nrow(processed)
#> [1] 131
#> [1] 17189
#> [1] 0.007621153
```

From above we see that less than 1% of product-symptom pairs are suspect based on the *QUANT_05* score. One may look more closely at these product-symptom combinations to ascertain which products may need further investigation. Subject matter knowledge is required to determine which signals might identify a possible causal relationship. The EB-scores find statistical associations – not necessarily causal relationships.

```
suspicious <- suspicious[order(suspicious$QUANT_05, decreasing = TRUE),
c("var1", "var2", "N", "E", "QUANT_05", "ebgm",
"QUANT_95")]
head(suspicious, 5)
#> var1 var2
#> 13924 REUMOFAN PLUS WEIGHT INCREASED
#> 8187 HYDROXYCUT REGULAR RAPID RELEASE CAPLETS EMOTIONAL DISTRESS
#> 13886 REUMOFAN PLUS IMMOBILE
#> 7793 HYDROXYCUT HARDCORE CAPSULES CARDIO-RESPIRATORY DISTRESS
#> 8220 HYDROXYCUT REGULAR RAPID RELEASE CAPLETS INJURY
#> N E QUANT_05 ebgm QUANT_95
#> 13924 16 0.40643623 15.68 23.26 33.48
#> 8187 19 0.89690107 11.65 16.78 23.55
#> 13886 6 0.07866508 10.16 18.28 30.83
#> 7793 8 0.30482718 9.00 15.25 24.52
#> 8220 11 0.56317044 8.98 14.28 21.78
tabbed <- table(suspicious$var1)
head(tabbed[order(tabbed, decreasing = TRUE)])
#>
#> HYDROXYCUT REGULAR RAPID RELEASE CAPLETS
#> 26
#> HYDROXYCUT HARDCORE CAPSULES
#> 13
#> REUMOFAN PLUS
#> 8
#> HYDROXYCUT CAPSULES
#> 5
#> HYDROXYCUT MAX LIQUID CAPS
#> 5
#> HYDROXYCUT CAFFEINE FREE CAPLETS
#> 4
```

The output above suggests some products which may require further investigation.

Next, the *openEBGM Objects and Class Functions* vignette will demonstrate the object-oriented features of the *openEBGM* package.