# EnrichIntersect Tutorial

This is a flexible tool for enrichment analysis based on user-defined sets. It allows users to perform over-representation analysis of the custom sets among any specified ranked feature list, hence making enrichment analysis applicable to various types of data from different scientific fields. EnrichIntersect also enables an interactive means to visualize identified associations based on, for example, the mix-lasso model (Zhao et al., 2022) or similar methods.

To get started, load the package with

library("EnrichIntersect")

## Examples

### Plot enrichment map

An example input data object cancers_drug_groups is an R list provided in our package, which includes a data.frame object with 147 cancer drugs as rows and nine cancer types as columns, and another data.frame that groups the 147 drugs (second column) into nine user-defined drug classes (first column). The default setup of enrichment() uses the classic K-S test statistic to calculate the normalized enrichment score that quantifies the degree to which the features in a feature set are over-represented at the top or bottom of the entire ranked list of features (e.g. a list of drugs), using as default 100 permutations for the empirical null test statistic. In the visualization, the statistically significantly enriched feature sets are marked with red coloured circles given a pre-specified significance level. The pre-specified significance level will be adjusted automatically if argument padj.method is one of c("holm","hochberg","hommel","bonferroni","BH","BY","fdr","none"). Users can specify the argument alpha for calculating a weighted enrichment score, argument normalize=FALSE for using the standard enrichment score rather than the normalized score, argument permute.n for the number of permutations of the ranked feature list used for estimating the empirical null test statistic, and the argument pvalue.cutoff for marking enriched categories at a specific significance level. See the following code for an example.

x <- cancers_drug_groups$score custom.set <- cancers_drug_groups$custom.set
set.seed(123)
enrich <- enrichment(x, custom.set, permute.n = 100)

### Plot Sankey diagram for intersecting set through an array

EnrichIntersect function intersectSankey() creates a Sankey diagram to visualize intersecting sets from an array object, in which the first dimension represents intermediate variables, and the second and third dimensions represent multiple levels and multiple tasks, respectively. One intersecting set is a list of intermediate variables associated with a combination of a subset of levels and a subset of tasks, which is not easy to visualize when all possible combinations of the two are many. Our function intersectSankey() has adapted sankeyNetwork() from R package networkD3 to create a D3'JavaScript’ interactive Sankey diagram in order to be suitable for several levels, multiple tasks and many intermediate variables. Besides saving the Sankey diagram as an interactive html file, similarly to networkD3, the user can also save the Sankey diagram as a pdf or png file via R package webshot2. The argument out.fig=c(NA,"html","pdf","png") in the function intersectSankey() determines the figure on the user’s R graphics device, to be saved either as a html, pdf or png file.

An example input data object cancers_genes_drugs in the package is an array with associations between, e.g., 56 genes (first dimension), two cancer types (second dimension) and two drugs (third dimension) provided in our package. The user can adjust the Sankey diagram argument out.fig for different output graph types and use argument step.names to indicate the labels of the three kinds of variables in a Sankey diagram, i.e., name of multiple levels, name of intermediate variables, and name of multiple tasks, see the following code for an example.

data(cancers_genes_drugs, package = "EnrichIntersect")
intersectSankey(cancers_genes_drugs, step.names = c("Cancers","Genes","Drugs"))

## Citation

Zhi Zhao, Shixiong Wang, Manuela Zucknick, Tero Aittokallio (2022). Tissue-specific identification of multi-omics features for pan-cancer drug response prediction. iScience, 25(8): 104767. DOI: https://doi.org/10.1016/j.isci.2022.104767.