# VAERS_data_analysis

library(zGPS.AO)

# Introduction

The package name zGPS.AO stands for Zero Inflated Gamma Poisson Shrinker with AE ontology, and in this vignettes, we will introduce how to use this package’s main function zinb_analysis_tool, by applying it to a real dataset, the VAERS dataset.

# Datasets

## Report data

The VAERS dataset is built in our package, and here’s what the data looks like:

data(vaers_data)
dim(vaers_data)
#> [1] 1017599       3
vaers_data[1:10,]
#> # A tibble: 10 x 3
#>        ID VAX_TYPE AE_NAME
#>     <dbl> <chr>    <chr>
#>  1 223972 DTAP     Swelling
#>  2 223972 DTAP     Feeling hot
#>  3 223972 DTAP     Erythema
#>  4 223972 DTAP     Tenderness
#>  5 231988 VARCEL   Pruritus
#>  6 231988 VARCEL   Urticaria
#>  7 231988 MMR      Pruritus
#>  8 231988 MMR      Urticaria
#>  9 231988 HEPA     Pruritus
#> 10 231988 HEPA     Urticaria

This data contains information about report ids, and all the vaccine-AE combinations in each reports. For example, in this dataset, report 223972 contains one vaccine DTAP, and four AEs.

## Group data

We are interested in identifying Relative Risks (RR) between vaccines and AEs. However, we are not only interested in the RR between vaccines and individual AEs, also we investigate the RR of vaccines and AE groups. The number of reports for a single vaccine and a single AE is too small and may have too much randomness. However, if we combine similar AEs into groups, and invest the vaccine-AE group associations, we could get more reliable results.

Here’s an example of group structure data from Medical Dictionary for Regulatory Activities (MedDRA) dataset.

data("dd.meddra")
dim(dd.meddra)
#> [1] 12482     2
dd.meddra[1:10,]
#>                           AE_NAME                                   GROUP_NAME
#> 1                         Anaemia Anaemias nonhaemolytic and marrow depression
#> 2       Anaemia folate deficiency Anaemias nonhaemolytic and marrow depression
#> 3              Anaemia macrocytic Anaemias nonhaemolytic and marrow depression
#> 4           Anaemia megaloblastic Anaemias nonhaemolytic and marrow depression
#> 5                Anaemia neonatal Anaemias nonhaemolytic and marrow depression
#> 6      Anaemia of chronic disease Anaemias nonhaemolytic and marrow depression
#> 7  Anaemia vitamin B12 deficiency Anaemias nonhaemolytic and marrow depression
#> 8           Aplasia pure red cell Anaemias nonhaemolytic and marrow depression
#> 9                Aplastic anaemia Anaemias nonhaemolytic and marrow depression
#> 10                     Babesiosis Anaemias nonhaemolytic and marrow depression

The group data used in this package is a data.frame containing two columns, AE_NAME and GROUP_NAME, specifying which AEs belong to which AE groups. It doesn’t hurt if there are some AEs belongs to more than one groups, but it is recommended to have non-overlapping groups. Those AE names in the report data should appear in the group data.

## Vaccines of interest

Reports of all vaccines will be used to do the analysis, but our model requires users to specify vaccines of their interest. Here is an example of how to specify them by an special list merge_list (will be used in the main function later):

data("merge_list")
merge_list[1:3]
#> [[1]]
#> [[1]][[1]]
#> [1] "FLUN3" "FLUN4"
#>
#> [[1]][[2]]
#> [1] "FLUN"
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] "FLU3" "FLU4"
#>
#> [[2]][[2]]
#> [1] "FLU"
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] "MMR"
#>
#> [[3]][[2]]
#> [1] "MMR"

The merge_list is a list containing sublists. For example, The first sublist here shows we are interested in FLUN3, FLUN4, but we would like to merge those vaccines to one type FLUN. The third sublist shows we are interested in MMR, and we want to keep it as one type.

# Our main function

Our main function zinb_analysis_tool is an all-in-one function that calculates both the group and AE level RRs as well as evaluating the p values. The first three parameters of this function are discussed above, and the min_freq and min_size determines the criteria for scarce AEs and small AE groups. Those AEs and AE groups will be excluded in the study. However, the data of those AEs and groups will not the deleted and still be used to calculate the baseline frequency table and permutation tests.

In evaluating the p values, we usually generate a large number of permuted datasets, like 2000. However, for the purples of this vignette, we only permute the dataset for 2 times by specifying n_perm = 2 in the function. When permuting the dataset for a large number of times, one may consider the built-in parallelization in our function. For example, here we use n_copies = 2 to imply that we want to break the job into 2 pieces and do them in parallel.

big_data = zinb_analysis_tool(dd.meddra,
vaers_data,
merge_list,
n_perm = 2,
n_copies = 2)

The output of zinb_analysis_tool contains group level RR s, group level parameters r, ‘beta’, and p, AE level RR lambda_hat and p values. ## Visualization

We have designed four functions to visualize the result on both AE and group levels.

### The Group level RR heatmap

The first plot one may want to look at is the group level RR heatmap. It can be generated by either plot_heatmap, or the more interactive version plotly_heatmap.

plot_heatmap(big_data)

The usage of plotly_heatmap is the same as plot_heatmap, users may execute the following code on their own computer:

plotly_heatmap(big_data)

### The AE level RR scatter plot

From the heam map, one may notice that the most significant group level RR is between FLUN and Hepatobiliary investigations. One may use the plot_lambdas to explore the AE level RR within this AE group specificly:

plot_lambdas('HEPAB', 'Hepatobiliary investigations', big_data)

Also, there is a more advanced interactive version of plot_lambdas called plotly_lambdas

plotly_lambdas('HEPAB', 'Hepatobiliary investigations', big_data)