Author: Tal Galili ( Tal.Galili@gmail.com )
tl;dr: the dendextend package let’s you create figures like this:
The dendextend package offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings, you can:
The goal of this document is to introduce you to the basic functions that dendextend provides, and show how they may be applied. We will make extensive use of “chaining” (explained next).
This package was made possible by the the support of my thesis adviser Yoav Benjamini, as well as code contributions from many R users. They are:
#> [1] "Tal Galili <tal.galili@gmail.com> [aut, cre, cph] (https://www.r-statistics.com)"
#> [2] "Gavin Simpson [ctb]"
#> [3] "Gregory Jefferis <jefferis@gmail.com> [ctb] (imported code from his dendroextras package)"
#> [4] "Marco Gallotta [ctb] (a.k.a: marcog)"
#> [5] "Johan Renaudie [ctb] (https://github.com/plannapus)"
#> [6] "R core team [ctb] (Thanks for the Infastructure, and code in the examples)"
#> [7] "Kurt Hornik [ctb]"
#> [8] "Uwe Ligges [ctb]"
#> [9] "Andrej-Nikolai Spiess [ctb]"
#> [10] "Steve Horvath <SHorvath@mednet.ucla.edu> [ctb]"
#> [11] "Peter Langfelder <Peter.Langfelder@gmail.com> [ctb]"
#> [12] "skullkey [ctb]"
#> [13] "Mark Van Der Loo <mark.vanderloo@gmail.com> [ctb] (https://github.com/markvanderloo d3dendrogram)"
#> [14] "Yoav Benjamini [ths]"
The design of the dendextend package (and this manual!) is heavily inspired by Hadley Wickham’s work. Especially his text on writing an R package, the devtools package, and the dplyr package (specifically the use of chaining, and the Introduction text to dplyr).
Function calls in dendextend often get a dendrogram and returns a (modified) dendrogram. This doesn’t lead to particularly elegant code if you want to do many operations at once. The same is true even in the first stage of creating a dendrogram.
In order to construct a dendrogram, you will (often) need to go through several steps. You can either do so while keeping the intermediate results:
<- c(1:5) # some data
d1 <- dist(d1)
d2 <- hclust(d2, method = "average")
d3 <- as.dendrogram(d3) dend
Or, you can also wrap the function calls inside each other:
<- as.dendrogram(hclust(dist(c(1:5)), method = "average")) dend
However, both solutions are not ideal: the first solution includes redundant intermediate objects, while the second is difficult to read (since the order of the operations is from inside to out, while the arguments are a long way away from the function).
To get around this problem, dendextend encourages the use of the
%>%
(“pipe” or “chaining”) operator (imported from the
magrittr package). This turns x %>% f(y)
into
f(x, y)
so you can use it to rewrite (“chain”) multiple
operations such that they can be read from left-to-right,
top-to-bottom.
For example, the following will be written as it would be explained:
<- c(1:5) %>% # take the a vector from 1 to 5
dend %>% # calculate a distance matrix,
dist hclust(method = "average") %>% # on it compute hierarchical clustering using the "average" method,
# and lastly, turn that object into a dendrogram. as.dendrogram
For more details, you may look at:
The first step is working with dendrograms, is to understand that they are just a nested list of lists with attributes. Let us explore this for the following (tiny) tree:
# Create a dend:
<- 1:2 %>% dist %>% hclust %>% as.dendrogram
dend # and plot it:
%>% plot dend
And here is its structure (a nested list of lists with attributes):
%>% unclass %>% str dend
#> List of 2
#> $ : int 1
#> ..- attr(*, "label")= int 1
#> ..- attr(*, "members")= int 1
#> ..- attr(*, "height")= num 0
#> ..- attr(*, "leaf")= logi TRUE
#> $ : int 2
#> ..- attr(*, "label")= int 2
#> ..- attr(*, "members")= int 1
#> ..- attr(*, "height")= num 0
#> ..- attr(*, "leaf")= logi TRUE
#> - attr(*, "members")= int 2
#> - attr(*, "midpoint")= num 0.5
#> - attr(*, "height")= num 1
%>% class dend
#> [1] "dendrogram"
To install the stable version on CRAN use:
install.packages('dendextend')
To install the GitHub version:
<- function (package, ...) {
require2 if (!require(package)) install.packages(package); library(package)
}
## require2('installr')
## install.Rtools() # run this if you are using Windows and don't have Rtools installed
# Load devtools:
require2("devtools")
::install_github('talgalili/dendextend')
devtools<!-- require2("Rcpp") -->
# Having colorspace is also useful, since it is used
# In various examples in the vignettes
require2("colorspace")
And then you may load the package using:
library(dendextend)
For the following simple tree:
# Create a dend:
<- 1:5 %>% dist %>% hclust %>% as.dendrogram
dend # Plot it:
%>% plot dend
Here are some basic parameters we can get:
%>% labels # get the labels of the tree dend
#> [1] 1 2 5 3 4
%>% nleaves # get the number of leaves of the tree dend
#> [1] 5
%>% nnodes # get the number of nodes in the tree (including leaves) dend
#> [1] 9
%>% head # A combination of "str" with "head" dend
#> --[dendrogram w/ 2 branches and 5 members at h = 4]
#> |--[dendrogram w/ 2 branches and 2 members at h = 1]
#> | |--leaf 1
#> | `--leaf 2
#> `--[dendrogram w/ 2 branches and 3 members at h = 2]
#> |--leaf 5
#> `--[dendrogram w/ 2 branches and 2 members at h = 1]
#> |--leaf 3
#> `--leaf 4
#> etc...
Next let us look at more sophisticated outputs.
When extracting (or inserting) attributes from a dendrogram’s nodes, it is often in a “depth-first search”. Depth-first search is when an algorithm for traversing or searching tree or graph data structures. One starts at the root and explores as far as possible along each branch before backtracking.
Here is a plot of a tree, illustrating the order in which you should read the “nodes attributes”:
We can get several nodes attributes using get_nodes_attr
(notice the order corresponds with what is shown in the above
figure):
# Create a dend:
<- 1:5 %>% dist %>% hclust %>% as.dendrogram
dend # Get various attributes
%>% get_nodes_attr("height") # node's height dend
#> [1] 4 1 0 0 2 0 1 0 0
%>% hang.dendrogram %>% get_nodes_attr("height") # node's height (after raising the leaves) dend
#> [1] 4.0 1.0 0.6 0.6 2.0 1.6 1.0 0.6 0.6
%>% get_nodes_attr("members") # number of members (leaves) under that node dend
#> [1] 5 2 1 1 3 1 2 1 1
%>% get_nodes_attr("members", id = c(2,5)) # number of members for nodes 2 and 5 dend
#> [1] 2 3
%>% get_nodes_attr("midpoint") # how much "left" is this node from its left-most child's location dend
#> [1] 1.625 0.500 NA NA 0.750 NA 0.500 NA NA
%>% get_nodes_attr("leaf") # is this node a leaf dend
#> [1] NA NA TRUE TRUE NA TRUE NA TRUE TRUE
%>% get_nodes_attr("label") # what is the label on this node dend
#> [1] NA NA 1 2 NA 5 NA 3 4
%>% get_nodes_attr("nodePar") # empty (for now...) dend
#> [1] NA NA NA NA NA NA NA NA NA
%>% get_nodes_attr("edgePar") # empty (for now...) dend
#> [1] NA NA NA NA NA NA NA NA NA
A similar function for leaves only is
get_leaves_attr
The fastest way to start changing parameters with dendextend is by
using the set
function. It is written as:
set(object, what, value)
, and accepts the following
parameters:
The what parameter accepts many options, each uses some general function in the background. These options deal with labels, nodes and branches. They are:
labels<-.dendrogram
)color_labels
)assign_values_to_leaves_nodePar
)assign_values_to_leaves_nodePar
)assign_values_to_leaves_nodePar
)assign_values_to_leaves_nodePar
)assign_values_to_nodes_nodePar
)assign_values_to_nodes_nodePar
)assign_values_to_nodes_nodePar
)assign_values_to_nodes_nodePar
)assign_values_to_nodes_nodePar
)hang.dendrogram
)color_branches
)assign_values_to_branches_edgePar
)assign_values_to_branches_edgePar
)assign_values_to_branches_edgePar
)branches_attr_by_labels
)branches_attr_by_labels
)branches_attr_by_labels
)remove_branches_edgePar
)remove_branches_edgePar
)For illustration purposes, we will create several small tree, and demonstrate these functions on them.
<- c(1:3) %>% # take some data
dend13 %>% # calculate a distance matrix,
dist hclust(method = "average") %>% # on it compute hierarchical clustering using the "average" method,
# and lastly, turn that object into a dendrogram.
as.dendrogram # same, but for 5 leaves:
<- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend15
par(mfrow = c(1,2))
%>% plot(main="dend13")
dend13 %>% plot(main="dend15")
dend15 # we could have also used plot(dend)
We can get a vector with the tree’s labels:
# get the labels:
%>% labels dend15
#> [1] 1 2 5 3 4
# this is just like labels(dend)
Notice how the tree’s labels are not 1 to 5 by order, since the tree happened to place them in a different order. We can change the names of the labels:
# change the labels, and then print them:
%>% set("labels", c(111:115)) %>% labels dend15
#> [1] 111 112 113 114 115
# could also be done using:
# labels(dend) <- c(111:115)
We can change the type of labels to be characters. Not doing so may be a source of various bugs and problems in many functions.
%>% labels dend15
#> [1] 1 2 5 3 4
%>% set("labels_to_char") %>% labels dend15
#> [1] "1" "2" "5" "3" "4"
We may also change their color and size:
par(mfrow = c(1,2))
%>% set("labels_col", "blue") %>% plot(main = "Change label's color") # change color
dend15 %>% set("labels_cex", 2) %>% plot(main = "Change label's size") # change color dend15
The function recycles, from left to right, the vector of values we give it. We can use this to create more complex patterns:
# Produce a more complex dendrogram:
<- dend15 %>%
dend15_2 set("labels", c(111:115)) %>% # change labels
set("labels_col", c(1,2,3)) %>% # change color
set("labels_cex", c(2,1)) # change size
par(mfrow = c(1,2))
%>% plot(main = "Before")
dend15 %>% plot(main = "After") dend15_2
Notice how these “labels parameters” are nested within the nodePar attribute:
# looking at only the left-most node of the "after tree":
1]][[1]] %>% unclass %>% str dend15_2[[
#> int 1
#> - attr(*, "label")= int 111
#> - attr(*, "members")= int 1
#> - attr(*, "height")= num 0
#> - attr(*, "leaf")= logi TRUE
#> - attr(*, "nodePar")=List of 3
#> ..$ lab.col: num 1
#> ..$ pch : logi NA
#> ..$ lab.cex: num 2
# looking at only the nodePar attributes in this sub-tree:
1]][[1]] %>% get_nodes_attr("nodePar") dend15_2[[
#> [,1]
#> lab.col 1
#> pch NA
#> lab.cex 2
When it comes to color, we can also set the parameter “k”, which will cut the tree into k clusters, and assign a different color to each label (based on its cluster):
par(mfrow = c(1,2))
%>% set("labels_cex", 2) %>% set("labels_col", value = c(3,4)) %>%
dend15 plot(main = "Recycles color \nfrom left to right")
%>% set("labels_cex", 2) %>% set("labels_col", value = c(3,4), k=2) %>%
dend15 plot(main = "Color labels \nper cluster")
abline(h = 2, lty = 2)
Each node in a tree can be represented and controlled using the
assign_values_to_nodes_nodePar
, and for the special case of
the nodes of leaves, the assign_values_to_leaves_nodePar
function is more appropriate (and faster) to use. We can control the
following properties: pch (point type), cex (point size), and col (point
color). For pch we can additionally set bg (“background”, although it’s
really a fill for the shape). When bg is set, the outline of the point
is defined by col and the internal fill is determined by bg. For
example:
par(mfrow = c(2,3))
%>% set("nodes_pch", 19) %>% plot(main = "(1) Show the\n nodes (as a dot)") #1
dend13 %>% set("nodes_pch", 19) %>% set("nodes_cex", 2) %>%
dend13 plot(main = "(2) Show (larger)\n nodes") #2
%>% set("nodes_pch", 19) %>% set("nodes_cex", 2) %>% set("nodes_col", 3) %>%
dend13 plot(main = "(3) Show (larger+colored)\n nodes") #3
%>% set("leaves_pch", 21) %>% plot(main = "(4) Show the leaves\n (as empty circles)") #4
dend13 %>% set("leaves_pch", 21) %>% set("leaves_cex", 2) %>%
dend13 plot(main = "(5) Show (larger)\n leaf circles") #5
%>%
dend13 set("leaves_pch", 21) %>%
set("leaves_bg", "gold") %>%
set("leaves_cex", 2) %>%
set("leaves_col", "darkred") %>%
plot(main = "(6) Show (larger+colored+filled)\n leaves") #6
And with recycling we can produce more complex outputs:
par(mfrow = c(1,2))
%>% set("nodes_pch", c(19,1,4)) %>% set("nodes_cex", c(2,1,2)) %>% set("nodes_col", c(3,4)) %>%
dend15 plot(main = "Adjust nodes")
%>% set("leaves_pch", c(19,1,4)) %>% set("leaves_cex", c(2,1,2)) %>% set("leaves_col", c(3,4)) %>%
dend15 plot(main = "Adjust nodes\n(but only for leaves)")
Notice how recycling works in a depth-first order (which is just left to right, when we only adjust the leaves). Here are the node’s parameters after adjustment:
%>% set("nodes_pch", c(19,1,4)) %>%
dend15 set("nodes_cex", c(2,1,2)) %>% set("nodes_col", c(3,4)) %>% get_nodes_attr("nodePar")
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> pch 19 1 4 19 1 4 19 1 4
#> cex 2 1 2 2 1 2 2 1 2
#> col 3 4 3 4 3 4 3 4 3
We can also change the height of of the leaves by using the
hang.dendrogram
function:
par(mfrow = c(1,3))
%>% set("leaves_pch", 19) %>% set("leaves_cex", 2) %>% set("leaves_col", 2) %>% # adjust the leaves
dend13 %>% # hang the leaves
hang.dendrogram plot(main = "Hanging a tree")
%>% set("leaves_pch", 19) %>% set("leaves_cex", 2) %>% set("leaves_col", 2) %>% # adjust the leaves
dend13 hang.dendrogram(hang_height = .6) %>% # hang the leaves (at some height)
plot(main = "Hanging a tree (but lower)")
%>% set("leaves_pch", 19) %>% set("leaves_cex", 2) %>% set("leaves_col", 2) %>% # adjust the leaves
dend13 %>% # hang the leaves
hang.dendrogram hang.dendrogram(hang = -1) %>% # un-hanging the leaves
plot(main = "Not hanging a tree")
An example of what this function does to the leaves heights:
%>% get_leaves_attr("height") dend13
#> [1] 0 0 0
%>% hang.dendrogram %>% get_leaves_attr("height") dend13
#> [1] 1.35 0.85 0.85
We can also control the general heights of nodes using
raise.dendrogram
:
par(mfrow = c(1,3))
%>% plot(main = "First tree", ylim = c(0,3))
dend13 %>%
dend13 raise.dendrogram (-1) %>%
plot(main = "One point lower", ylim = c(0,3))
%>%
dend13 raise.dendrogram (1) %>%
plot(main = "One point higher", ylim = c(0,3))
If you wish to make the branches under the root have the same height,
you can use the flatten.dendrogram
function.
Similar to adjusting nodes, we can also control line width (lwd), line type (lty), and color (col) for branches:
par(mfrow = c(1,3))
%>% set("branches_lwd", 4) %>% plot(main = "Thick branches")
dend13 %>% set("branches_lty", 3) %>% plot(main = "Dashed branches")
dend13 %>% set("branches_col", 2) %>% plot(main = "Red branches") dend13
We may also use recycling to create more complex patterns:
# Produce a more complex dendrogram:
%>%
dend15 set("branches_lwd", c(4,1)) %>%
set("branches_lty", c(1,1,3)) %>%
set("branches_col", c(1,2,3)) %>%
plot(main = "Complex branches", edge.root = TRUE)
Notice how the first branch (the root) is considered when going
through and creating the tree, but it is ignored in the
actual plotting (this is actually a “missing feature” in
plot.dendrogram
).
We may also control the colors of the branches based on using clustering:
par(mfrow = c(1,2))
%>% set("branches_k_color", k = 3) %>% plot(main = "Nice defaults")
dend15 %>% set("branches_k_color", value = 3:1, k = 3) %>%
dend15 plot(main = "Controlling branches' colors\n(via clustering)")
# This is like using the `color_branches` function
The most powerful way to control branches is through the
branches_attr_by_labels
function (with variations through
the set
function). The function allows you to change
col/lwd/lty of branches if they match some “labels condition”. Follow
carefully:
par(mfrow = c(1,2))
%>% set("by_labels_branches_col", value = c(1,4)) %>%
dend15 plot(main = "Adjust the branch\n if ALL (default) of its\n labels are in the list")
%>% set("by_labels_branches_col", value = c(1,4), type = "any") %>%
dend15 plot(main = "Adjust the branch\n if ANY of its\n labels are in the list")
We can use this to change the size/type/color of the branches:
# Using "Inf" in "TF_values" means to let the parameters stay as they are.
par(mfrow = c(1,3))
%>% set("by_labels_branches_col", value = c(1,4), TF_values = c(3,Inf)) %>%
dend15 plot(main = "Change colors")
%>% set("by_labels_branches_lwd", value = c(1,4), TF_values = c(8,1)) %>%
dend15 plot(main = "Change line width")
%>% set("by_labels_branches_lty", value = c(1,4), TF_values = c(3,Inf)) %>%
dend15 plot(main = "Change line type")
The highlight_branches
function helps to more easily see
the topological structure of a tree, by adjusting branches appearence
(color and line width) based on their height in the tree. For
example:
<- iris[1:20,-5]
dat <- hclust(dist(dat))
hca <- hclust(dist(dat), method = "single")
hca2 <- as.dendrogram(hca)
dend <- as.dendrogram(hca2)
dend2
par(mfrow = c(1,3))
%>% highlight_branches_col %>% plot(main = "Coloring branches")
dend %>% highlight_branches_lwd %>% plot(main = "Emphasizing line-width")
dend %>% highlight_branches %>% plot(main = "Emphasizing color\n and line-width") dend
Tanglegrams are even easier to compare when using
library(viridis)
par(mfrow = c(1,3))
%>% highlight_branches_col %>% plot(main = "Coloring branches \n (default is reversed viridis)")
dend %>% highlight_branches_col(viridis(100)) %>% plot(main = "It is better to use \n lighter colors in the leaves")
dend %>% highlight_branches_col(rev(magma(1000))) %>% plot(main = "The magma color pallatte\n is also good") dend
<- dendlist(dend, dend2)
dl tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE, highlight_branches_lwd = FALSE)
tanglegram(dl)
tanglegram(dl, fast = TRUE)
<- dendlist(highlight_branches(dend), highlight_branches(dend2))
dl tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE)
# dend %>% set("highlight_branches_col") %>% plot
<- dendlist(dend, dend2) %>% set("highlight_branches_col")
dl tanglegram(dl, sort = TRUE, common_subtrees_color_lines = FALSE, highlight_distinct_edges = FALSE)
A dendrogram is an object which can be rotated on its hinges without
changing its topology. Rotating a dendrogram in base R can be done using
the reorder
function. The problem with this function is
that it is not very intuitive. For this reason the rotate
function was written. It has two main arguments: the “object” (a
dendrogram), and the “order” we wish to rotate it by. The “order”
parameter can be either a numeric vector, used in a similar way we would
order a simple character vector. Or, the order parameter can also be a
character vector of the labels of the tree, given in the new desired
order of the tree. It is also worth noting that some order are
impossible to achieve for a given tree’s topology. In such cases, the
function will do its “best” to get as close as possible to the requested
rotation.
par(mfrow = c(1,3))
%>%
dend15 set("labels_colors") %>%
set("branches_k_color") %>%
plot(main = "First tree")
%>%
dend15 set("labels_colors") %>%
set("branches_k_color") %>%
rotate(as.character(5:1)) %>% #rotate to match labels new order
plot(main = "Rotated tree\n based on labels")
%>%
dend15 set("labels_colors") %>%
set("branches_k_color") %>%
rotate(5:1) %>% # the fifth label to go first is "4"
plot(main = "Rotated tree\n based on order")
A new convenience S3 function for sort
(sort.dendrogram
) was added:
<- c(1, 3:5, 7,9,10) %>% dist %>% hclust(method = "average") %>%
dend110 %>% color_labels %>% color_branches
as.dendrogram
par(mfrow = c(1,3))
%>% plot(main = "Original tree")
dend110 %>% sort %>% plot(main = "labels sort")
dend110 %>% sort(type = "nodes") %>% plot(main = "nodes (ladderize) sort") dend110
We can unbranch a tree:
par(mfrow = c(1,3))
%>% plot(main = "First tree", ylim = c(0,3))
dend15 %>%
dend15 %>%
unbranch plot(main = "Unbranched tree", ylim = c(0,3))
%>%
dend15 unbranch(2) %>%
plot(main = "Unbranched tree (2)", ylim = c(0,3))
We can prune a tree based on the labels:
par(mfrow = c(1,2))
%>% set("labels_colors") %>%
dend15 plot(main = "First tree", ylim = c(0,3))
%>% set("labels_colors") %>%
dend15 prune(c("1","5")) %>%
plot(main = "Prunned tree", ylim = c(0,3))
For pruning two trees to have matching labels, we can use the
intersect_trees
function:
par(mfrow = c(1,2))
<- intersect_trees(dend13, dend15)
dend_intersected 1]] %>% plot
dend_intersected[[2]] %>% plot dend_intersected[[
We can collapse branches under a tolerance level using the
collapse_branch
function:
# ladderize is like sort(..., type = "node")
<- iris[1:5,-5] %>% dist %>% hclust %>% as.dendrogram
dend par(mfrow = c(1,3))
%>% ladderize %>% plot(horiz = TRUE); abline(v = .2, col = 2, lty = 2)
dend %>% collapse_branch(tol = 0.2) %>% ladderize %>% plot(horiz = TRUE)
dend %>% collapse_branch(tol = 0.2) %>% ladderize %>% hang.dendrogram(hang = 0) %>% plot(horiz = TRUE) dend
Earlier we have seen how to highlight clusters in a dendrogram by
coloring branches. We can also draw rectangles around the branches of a
dendrogram in order to highlight the corresponding clusters. First the
dendrogram is cut at a certain level, then a rectangle is drawn around
selected branches. This is done using the rect.dendrogram
,
which is modeled based on the rect.hclust
function. One
advantage of rect.dendrogram
over rect.hclust
,
is that it also works on horizontally plotted trees:
layout(t(c(1,1,1,2,2)))
%>% set("branches_k_color") %>% plot
dend15 %>% rect.dendrogram(k=3,
dend15 border = 8, lty = 5, lwd = 2)
%>% set("branches_k_color") %>% plot(horiz = TRUE)
dend15 %>% rect.dendrogram(k=3, horiz = TRUE,
dend15 border = 8, lty = 5, lwd = 2)
Adding colored bars to a dendrogram may be useful to show clusters or some outside categorization of the items. For example:
<- ifelse(labels(dend15) %% 2, 2,3)
is_odd <- ifelse(labels(dend15) > 2, 3,4)
is_345 <- ifelse(labels(dend15) <= 2, 3,4)
is_12 <- cutree(dend15,k = 3, order_clusters_as_data = FALSE)
k_3 # The FALSE above makes sure we get the clusters in the order of the
# dendrogram, and not in that of the original data. It is like:
# cutree(dend15, k = 3)[order.dendrogram(dend15)]
<- cbind(is_odd, is_345, is_12, k_3)
the_bars ==2] <- 8
the_bars[the_bars
%>% plot
dend15 colored_bars(colors = the_bars, dend = dend15, sort_by_labels_order = FALSE)
# we use sort_by_labels_order = FALSE since "the_bars" were set based on the
# labels order. The more common use case is when the bars are based on a second variable
# from the same data.frame as dend was created from. Thus, the default
# sort_by_labels_order = TRUE would make more sense.
Another example, based on mtcars (in which the default of
sort_by_labels_order = TRUE
makes sense):
<- mtcars[, c("mpg", "disp")] %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend_mtcars
par(mar = c(10,2,1,1))
plot(dend_mtcars)
<- ifelse(mtcars$am, "grey", "gold")
the_bars colored_bars(colors = the_bars, dend = dend_mtcars, rowLabels = "am")
The core process is to transform a dendrogram into a
ggdend
object using as.ggdend
, and then plot
it using ggplot
(a new S3 ggplot.ggdend
function is available). These two steps can be done in one command with
either the function ggplot
or ggdend
.
The reason we want to have as.ggdend
(and not only
ggplot.dendrogram
), is (1) so that you could create your
own mapping of ggdend
and, (2) since as.ggdend
might be slow for large trees, it is probably better to be able to run
it only once for such cases.
A ggdend
class object is a list with 3 components:
segments, labels, nodes. Each one contains the graphical parameters from
the original dendrogram, but in a tabular form that can be used by
ggplot2+geom_segment+geom_text
to create a dendrogram
plot.
The function prepare.ggdend
is used by
plot.ggdend
to take the ggdend object and prepare it for
plotting. This is because the defaults of various parameters in
dendrogram’s are not always stored in the object itself, but are
built-in into the plot.dendrogram
function. For example,
the color of the labels is not (by default) specified in the dendrogram
(only if we change it from black to something else). Hence, when taking
the object into a different plotting engine (say ggplot2), we want to
prepare the object by filling-in various defaults. This function is
automatically invoked within the plot.ggdend
function. You
would probably use it only if you’d wish to build your own ggplot2
mapping.
# Create a complex dend:
<- iris[1:30,-5] %>% dist %>% hclust %>% as.dendrogram %>%
dend set("branches_k_color", k=3) %>% set("branches_lwd", c(1.5,1,1.5)) %>%
set("branches_lty", c(1,1,3,1,1,2)) %>%
set("labels_colors") %>% set("labels_cex", c(.9,1.2)) %>%
set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", NA))
# plot the dend in usual "base" plotting engine:
plot(dend)
# Now let's do it in ggplot2 :)
<- as.ggdend(dend)
ggd1 library(ggplot2)
# the nodes are not implemented yet.
ggplot(ggd1) # reproducing the above plot in ggplot2 :)
ggplot(ggd1, horiz = TRUE, theme = NULL) # horiz plot (and let's remove theme) in ggplot2
# Adding some extra spice to it...
# creating a radial plot:
# ggplot(ggd1) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta="x")
# The text doesn't look so great, so let's remove it:
ggplot(ggd1, labels = FALSE) + scale_y_reverse(expand = c(0.2, 0)) + coord_polar(theta="x")
Credit: These functions are extended
versions of the functions ggdendrogram
,
dendro_data
(and the hidden dendrogram_data
)
from Andrie de Vries’s ggdendro package.
The motivation for this fork is the need to add more graphical
parameters to the plotted tree. This required a strong mixture of
functions from ggdendro and dendextend (to the point that it seemed
better to just fork the code into its current form).
The dendextend package aims to extend and enhance features from the R ecosystem. Let us take a look at several examples.
The DendSer package helps in re-arranging a dendrogram to optimize
visualization-based cost functions. Until now it was only used for
hclust
objects, but it can easily be connected to
dendrogram
objects by trying to turn the dendrogram into
hclust, on which it runs DendSer. This can be used to rotate the
dendrogram easily by using the rotate_DendSer
function:
if(require(DendSer)) {
par(mfrow = c(1,2))
DendSer.dendrogram(dend15)
%>% color_branches %>% plot
dend15 %>% color_branches %>% rotate_DendSer %>% plot
dend15 }
The gplots package brings us the heatmap.2
function. In
it, we can use our modified dendrograms to get more informative
heat-maps:
library(gplots)
<- as.matrix(datasets::mtcars)
x
heatmap.2(x)
# now let's spice up the dendrograms a bit:
<- x %>% dist %>% hclust %>% as.dendrogram %>%
Rowv set("branches_k_color", k = 3) %>% set("branches_lwd", 4) %>%
ladderize# rotate_DendSer(ser_weight = dist(x))
<- x %>% t %>% dist %>% hclust %>% as.dendrogram %>%
Colv set("branches_k_color", k = 2) %>% set("branches_lwd", 4) %>%
ladderize# rotate_DendSer(ser_weight = dist(t(x)))
heatmap.2(x, Rowv = Rowv, Colv = Colv)
The same as gplots, NMF offers a heatmap function called
aheatmap
. We can update it just as we would
heatmap.2
.
Since NMF was removed from CRAN (it could still be installed from source), the example code is still available but not ran in this vignette.
# library(NMF)
#
# x <- as.matrix(datasets::mtcars)
#
# # now let's spice up the dendrograms a bit:
# Rowv <- x %>% dist %>% hclust %>% as.dendrogram %>%
# set("branches_k_color", k = 3) %>% set("branches_lwd", 4) %>%
# ladderize
# # rotate_DendSer(ser_weight = dist(x))
# Colv <- x %>% t %>% dist %>% hclust %>% as.dendrogram %>%
# set("branches_k_color", k = 2) %>% set("branches_lwd", 4) %>%
# ladderize
# # rotate_DendSer(ser_weight = dist(t(x)))
#
# aheatmap(x, Rowv = Rowv, Colv = Colv)
The heatmaply package create interactive heat-maps that are usable from the R console, in the ‘RStudio’ viewer pane, in ‘R Markdown’ documents, and in ‘Shiny’ apps. By hovering the mouse pointer over a cell or a dendrogram to show details, drag a rectangle to zoom.
The use is very similar to what we’ve seen before, we just use
heatmaply
instead of heatmap.2
:
<- as.matrix(datasets::mtcars)
x # heatmaply(x)
# now let's spice up the dendrograms a bit:
<- x %>% dist %>% hclust %>% as.dendrogram %>%
Rowv set("branches_k_color", k = 3) %>% set("branches_lwd", 4) %>%
ladderize# rotate_DendSer(ser_weight = dist(x))
<- x %>% t %>% dist %>% hclust %>% as.dendrogram %>%
Colv set("branches_k_color", k = 2) %>% set("branches_lwd", 4) %>%
ladderize# rotate_DendSer(ser_weight = dist(t(x)))
Here we need to use cache=FALSe
in the markdown:
library(heatmaply)
heatmaply(x, Rowv = Rowv, Colv = Colv)
I avoided running the code from above due to space issues on CRAN. For live examples, please go to:
The cutreeDynamic
function offers a wrapper for two
methods of adaptive branch pruning of hierarchical clustering
dendrograms. The results of which can now be visualized by both updating
the branches, as well as using the colored_bars
function
(which was adjusted for use with plots of dendrograms):
# let's get the clusters
library(dynamicTreeCut)
data(iris)
<- iris[,-5] %>% as.matrix
x <- x %>% dist %>% hclust
hc <- hc %>% as.dendrogram
dend
# Find special clusters:
<- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
clusters # we need to sort them to the order of the dendrogram:
<- clusters[order.dendrogram(dend)]
clusters <- unique(clusters) - (0 %in% clusters)
clusters_numbers <- length(clusters_numbers)
n_clusters
library(colorspace)
<- rainbow_hcl(n_clusters)
cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
true_species_cols <- dend %>%
dend2 branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
plot(dend2)
<- factor(clusters)
clusters levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
# Get the clusters to have proper colors.
# fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)
# here we used sort_by_labels_order = FALSE since the clusters were already sorted based on the dendrogram's order
The pvclust library calculates “p-values”” for hierarchical clustering via multiscale bootstrap re-sampling. Hierarchical clustering is done for given data and p-values are computed for each of the clusters. The dendextend package let’s us reproduce the plot from pvclust, but with a dendrogram (instead of an hclust object), which also lets us extend the visualization.
par(mfrow = c(1,2))
library(pvclust)
data(lung) # 916 genes for 73 subjects
set.seed(13134)
<- pvclust(lung[1:100, 1:10],
result method.dist="cor", method.hclust="average", nboot=10)
# with pvrect
plot(result)
pvrect(result)
# with a dendrogram of pvrect
<- as.dendrogram(result)
dend %>% as.dendrogram %>%
result plot(main = "Cluster dendrogram with AU/BP values (%)\n reproduced plot with dendrogram")
%>% text
result %>% pvrect result
Let’s color and thicken the branches based on the p-values:
par(mfrow = c(2,2))
# with a modified dendrogram of pvrect
%>% pvclust_show_signif(result) %>%
dend plot(main = "Cluster dendrogram \n bp values are highlighted by signif")
%>% pvclust_show_signif(result, show_type = "lwd") %>%
dend plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are highlighted by signif")
%>% text
result %>% pvrect(alpha=0.95)
result
%>% pvclust_show_signif_gradient(result) %>%
dend plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are colored by signif")
%>%
dend pvclust_show_signif_gradient(result) %>%
pvclust_show_signif(result) %>%
plot(main = "Cluster dendrogram with AU/BP values (%)\n bp values are colored+highlighted by signif")
%>% text
result %>% pvrect(alpha=0.95) result
Circular layout is an efficient way for the visualization of huge amounts of information. The circlize package provides an implementation of circular layout generation in R, including a solution for dendrogram objects produced using dendextend:
library(circlize)
<- iris[1:40,-5] %>% dist %>% hclust %>% as.dendrogram %>%
dend set("branches_k_color", k=3) %>% set("branches_lwd", c(5,2,1.5)) %>%
set("branches_lty", c(1,1,3,1,1,2)) %>%
set("labels_colors") %>% set("labels_cex", c(.6,1.5)) %>%
set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", NA))
par(mar = rep(0,4))
circlize_dendrogram(dend)