To get value from a clustering algorithm, it is important to understand the mapping procedure of an algorithm that assigns instances to clusters. FACT is an algorithm agnostic framework that provides feature attribution while preserving the integrity of the data.
SMART
(Scoring Metric After Permutation) permutes
feature sets to measure the sensitivity of algorithms to changes in
cluster assignments.IDEA
(Isolated Effect on Assignment) visualizes local
and global changes in cluster assignments over one- and two-dimensional
feature spaces.You can install the development version of FACT
like
so:
# Development version
::install_github("henrifnk/FACT") remotes
We aim to divide American states by their standardized crime rates in 3 clusters.
library(FACT)
library(mlr3cluster)
#> Lade nötiges Paket: mlr3
= attributes(scale(USArrests)) attributes_scale
Murder | Assault | UrbanPop | Rape | |
---|---|---|---|---|
Alabama | 1.24 | 0.78 | -0.52 | 0.00 |
Alaska | 0.51 | 1.11 | -1.21 | 2.48 |
Arizona | 0.07 | 1.48 | 1.00 | 1.04 |
Arkansas | 0.23 | 0.23 | -1.07 | -0.18 |
California | 0.28 | 1.26 | 1.76 | 2.07 |
Colorado | 0.03 | 0.40 | 0.86 | 1.86 |
USArrests Data Set
Therefore, we use a c-means algorithm from
mlr3cluster
.
= TaskClust$new(id = "usarest", backend = data.frame(scale(USArrests)))
tsk_usa = lrn("clust.cmeans", centers = 3, predict_type = "prob")
c_lrn $train(tsk_usa) c_lrn
Then, we create a ClustPredictor
that wraps the
information needed for our methods.
= ClustPredictor$new(c_lrn, data = tsk_usa$data(), y = c_lrn$model$membership) predictor
How does Assault
effect the partitions created by
c-means
clustering?
The sIDEA
plot shows:
Assault
were realizations of observations can be found
(visualized by the geom_rug
).f(k)
.= IDEA$new(predictor, "Assault", grid.size = 50)
idea_assault $plot_globals(0.5) idea_assault
Short Interpretation:
Assault
rate.Assault
rate.Assault
rate.If you use FACT
in a scientific publication, please cite it as:
Scholbeck, C. A., Funk, H., & Casalicchio, G. (2022). Algorithm-Agnostic Interpretations for Clustering. arXiv preprint arXiv:2209.10578.
BibTeX:
@article{FACT_22,
title={Algorithm-Agnostic Interpretations for Clustering},
author={Scholbeck, Christian A and Funk, Henri and Casalicchio, Giuseppe},
journal={arXiv preprint arXiv:2209.10578},
year={2022} }