arules — Mining Association Rules and Frequent Itemsets with R

CRAN version CRAN RStudio mirror downloads R build status

The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including a interfaces and the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat. Examples can be found in Chapter 5 of the web book R Companion for Introduction to Data Mining.

arules core packages:

Additional mining algorithms

In-database analytics

Interface

Classification

Outlier Detection

Recommendation/Prediction

Installation

Stable CRAN version: install from within R with

install.packages("arules")

Current development version: install from GitHub (needs devtools and Rtools for Windows).

devtools::install_github("mhahsler/arules")

Usage

Load package and mine some association rules.

library("arules")
data("IncomeESL")

trans <- transactions(IncomeESL)
trans
## transactions in sparse format with
##  8993 transactions (rows) and
##  84 items (columns)
rules <- apriori(trans, parameter = list(supp = 0.1, conf = 0.9, target = "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 899 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.03s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Inspect the rules with the highest lift.

inspect(head(rules, n = 3, by = "lift"))
##     lhs                           rhs                      support confidence coverage lift count
## [1] {dual incomes=no,                                                                            
##      householder status=own}   => {marital status=married}    0.10       0.97     0.10  2.6   914
## [2] {years in bay area=>10,                                                                      
##      dual incomes=yes,                                                                           
##      type of home=house}       => {marital status=married}    0.10       0.96     0.10  2.6   902
## [3] {dual incomes=yes,                                                                           
##      householder status=own,                                                                     
##      type of home=house,                                                                         
##      language in home=english} => {marital status=married}    0.11       0.96     0.11  2.6   988

Using arule and tidyverse

arules works seamlessly with tidyverse. For example, dplyr can be used for cleaning and preparing the transactions and then functions in arules can be used with %>%.

library("tidyverse")
library("arules")
data("IncomeESL")

trans <- IncomeESL %>% transactions()

rules <- trans %>% apriori(parameter = list(supp = 0.1, conf = 0.9, target = "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 899 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.03s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules %>% head(n = 3, by = "lift") %>% inspect()
##     lhs                           rhs                      support confidence coverage lift count
## [1] {dual incomes=no,                                                                            
##      householder status=own}   => {marital status=married}    0.10       0.97     0.10  2.6   914
## [2] {years in bay area=>10,                                                                      
##      dual incomes=yes,                                                                           
##      type of home=house}       => {marital status=married}    0.10       0.96     0.10  2.6   902
## [3] {dual incomes=yes,                                                                           
##      householder status=own,                                                                     
##      type of home=house,                                                                         
##      language in home=english} => {marital status=married}    0.11       0.96     0.11  2.6   988

Using arules from Python

See Getting started with arules using Python.

Support

Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.

References