Trees are ubiquitous in mathematics, computer science, data sciences, finance, and in many other fields. Trees are especially useful when we are facing hierarchical data. For example, trees are used:
For more details, see the applications vignette by typing
vignette("applications", package = "data.tree")
Tree-like structures are already used in R. For example, environments can be seen as nodes in a tree. And CRAN provides numerous packages that deal with tree-like structures, especially in the area of decision theory. Yet, there is no general purpose hierarchical data structure that could be used as conveniently and generically as, say,
As a result, people often try to resolve hierarchical problems in a tabular fashion, for instance with data.frames. But often, hierarchies don’t marry with tables, and various workarounds are usually required.
This package offers an alternative. The
data.tree package lets you create hierarchies, called
data.tree structures. The building block of theses structures are
Node objects. The package provides basic traversal, search, and sort operations, and an infrastructure for recursive tree programming. You can decorate
Nodes with your own fields and methods, so as to extend the package to your needs.
The package also provides convenience methods for neatly printing and plotting trees. It supports conversion from and to
lists, and other tree structures such as
phylo objects from the ape package,
igraph, and other packages.
data.tree structures are bi-directional, ordered trees. Bi-directional means that you can navigate from parent to chidren and vice versa. Ordered means that the sort order of the children of a parent node is well-defined.
data.treestructure: a tree, consisting of multiple
Nodeobjects. Often, the entry point to a
data.treestructure is the root Node
Node: both a class and the basic building block of
?attr, which have a different meaning. Many methods and functions have an
attributearg, which can refer to a an active, a field or a method. For example, see
Nodethat can be called like an attribute, but behaves like a function without arguments. For example:
node$cost <- 2500
Nodein this context). Many methods are available in OO style (e.g.
node$Revert()) or in traditional style (
Nodeinherits e.g. an attribute from one of its ancestors. For example, see
There are different ways to create a
data.tree structure. For example, you can create a tree programmatically, by conversion from other R objects, or from a file.
Let’s start by creating a tree programmatically. We do this by creating
Node objects, and linking them together so as to define the parent-child relationships.
In this example, we are looking at a company, Acme Inc., and the tree reflects its organisational structure. The root (level 1) is the company. On level 2, the nodes represent departments, and the leaves of the tree represent projects that the company is considering for next year:
library(data.tree) acme <- Node$new("Acme Inc.") accounting <- acme$AddChild("Accounting") software <- accounting$AddChild("New Software") standards <- accounting$AddChild("New Accounting Standards") research <- acme$AddChild("Research") newProductLine <- research$AddChild("New Product Line") newLabs <- research$AddChild("New Labs") it <- acme$AddChild("IT") outsource <- it$AddChild("Outsource") agile <- it$AddChild("Go agile") goToR <- it$AddChild("Switch to R") print(acme)
## levelName ## 1 Acme Inc. ## 2 ¦--Accounting ## 3 ¦ ¦--New Software ## 4 ¦ °--New Accounting Standards ## 5 ¦--Research ## 6 ¦ ¦--New Product Line ## 7 ¦ °--New Labs ## 8 °--IT ## 9 ¦--Outsource ## 10 ¦--Go agile ## 11 °--Switch to R
As you can see from the previous example, each
Node is identified by its name, i.e. the argument you pass into the
Node$new(name) constructor. The name needs to be unique among siblings, such that paths to
Nodes are unambiguous.
Node inherits from
R6 reference class. This has the following implications:
Nodein OO style, e.g.
Nodeexhibits reference semantics. Thus, multiple variables in R can point to the same
Node, and modifying a
Nodewill modify it for all referencing variables. In the above code example, both
itreference the same object. This is different from the value semantics, which is much more widely used in R.
Creating a tree programmatically is useful especially in the context of algorithms. However, most times you will create a tree by conversion. This could be by conversion from a nested list-of-lists, by conversion from another R tree-structure (e.g. an ape
phylo), or by conversion from a
data.frame. For more details on all the options, type
?as.Node and refer to the See Also section.
One of the most common conversions is the one from a
data.frame in table format. The following code illustrates this. We load the GNI2014 data from the treemap package. This
data.frame is in table format, meaning that each row will represent a leaf in the
library(treemap) data(GNI2014) head(GNI2014)
## iso3 country continent population GNI ## 3 BMU Bermuda North America 67837 106140 ## 4 NOR Norway Europe 4676305 103630 ## 5 QAT Qatar Asia 833285 92200 ## 6 CHE Switzerland Europe 7604467 88120 ## 7 MAC Macao SAR, China Asia 559846 76270 ## 8 LUX Luxembourg Europe 491775 75990
Let’s convert that into a
data.tree structure! We start by defining a pathString. The pathString describes the hierarchy by defining a path from the root to each leaf. In this example, the hierarchy comes very naturally:
GNI2014$pathString <- paste("world", GNI2014$continent, GNI2014$country, sep = "/")
Once our pathString is defined, conversion to Node is very easy:
population <- as.Node(GNI2014) print(population, "iso3", "population", "GNI", limit = 20)
## levelName iso3 population GNI ## 1 world NA NA ## 2 ¦--North America NA NA ## 3 ¦ ¦--Bermuda BMU 67837 106140 ## 4 ¦ ¦--United States USA 313973000 55200 ## 5 ¦ ¦--Canada CAN 33487208 51630 ## 6 ¦ ¦--Bahamas, The BHS 309156 20980 ## 7 ¦ ¦--Trinidad and Tobago TTO 1310000 20070 ## 8 ¦ ¦--Puerto Rico PRI 3971020 19310 ## 9 ¦ ¦--Barbados BRB 284589 15310 ## 10 ¦ ¦--St. Kitts and Nevis KNA 40131 14920 ## 11 ¦ ¦--Antigua and Barbuda ATG 85632 13300 ## 12 ¦ ¦--Panama PAN 3360474 11130 ## 13 ¦ ¦--Costa Rica CRI 4253877 10120 ## 14 ¦ ¦--Mexico MEX 111211789 9870 ## 15 ¦ ¦--Grenada GRD 90739 7910 ## 16 ¦ ¦--St. Lucia LCA 160267 7260 ## 17 ¦ ¦--Dominica DMA 72660 6930 ## 18 ¦ ¦--St. Vincent and the Grenadines VCT 104574 6610 ## 19 ¦ ¦--Dominican Republic DOM 9650054 6040 ## 20 ¦ °--... 7 nodes w/ 0 sub NA NA ## 21 °--... 6 nodes w/ 171 sub NA NA
This is a simple example, and more options are available. Type
?FromDataFrameTable for all the details.