alike is similar to
all.equal from base R
except it only compares object structure. As with
all.equal, the first argument (
target) must be
matched by the second (
library(vetr) alike(integer(5), 1:5) # different values, but same structure
alike(integer(5), 1:4) # wrong size
 "`length(1:4)` should be 5 (is 4)"
alike(integer(26), letters) # same size, but different types
 "`letters` should be type \"integer-like\" (is \"character\")"
alike only compares structural elements that are defined
target (a.k.a. the template). This allows “wildcard”
templates. For example, we consider length zero vectors to have
undefined length so those match vectors of any length:
alike(integer(), letters) # type is still defined and must match
 "`letters` should be type \"integer-like\" (is \"character\")"
Similarly, if a template does not specify an attribute, objects with any value for that attribute will match:
alike(list(), data.frame()) # a data frame is a list with a attributes
alike(data.frame(), list()) # but a list does not have the data.frame attributes
 "`list()` should be class \"data.frame\" (is \"list\")"
As an extension to the wildcard concept, we interpret partially specified core R attributes. Here we allow any three column integer matrix to match:
<- matrix(integer(), ncol=3) # partially specified matrix mx.tpl alike(mx.tpl, matrix(sample(1:12), nrow=4)) # any number of rows match
alike(mx.tpl, matrix(sample(1:12), nrow=3)) # but column count must match
 "`matrix(sample(1:12), nrow = 3)` should have 3 columns (has 4)"
or a data frame of arbitrary number of rows, but same column
<- iris[0, ] # no rows, but structure is defined iris.tpl alike(iris.tpl, iris[1:10, ]) # any number of rows match
alike(iris.tpl, CO2) # but column structure must match
 "`names(CO2)` should be \"Sepal.Length\" (is \"Plant\")"
“alikeness” is complex to describe, but should be intuitive to grasp.
We recommend you look
example(alike) to get a sense of
“alikeness”. If you want to understand the specifics, read on.
alike’s template based comparison is declarative. You
declare what structure an object is expected to implement, and
vetr infers all the computations required to verify that is
so. This makes is particularly well suited for enforcing structural
requirements for S3 objects. The S4 system does this and more, but S3
objects are still used extensively in R code, and sometimes S4 classes
are not appropriate.
There are several advantages to template based comparisons:
The template concept was inspired by
alike compares objects on type, length, and attributes. Recursive
structures are compared element by element. Language objects and functions are compared specially because the
concept of a value within those is more complex (e.g., is the
x + y just a value?).
We will defer discussion of attribute comparison to the attributes section.
Objects must be the same length to be
alike, unless the
target) is zero length, in which case the object
may be any length. Environments are an
exception: we only require that all the elements present in
target be present in
current. Also, note that
( are ignored in language objects, which may affect length
Type comparison is done on type (i.e. the
some adjustments to better align comparisons to “percieved” types as
opposed to internal storage types.
We allow integer vectors to be considered numeric, and short integer-like numerics to be treated as integers:
alike(1L, 1) # `1` is not technically integer, but we treat it as such
alike(1L, 1.1) # 1.1 is not integer-like
 "`1.1` should be type \"integer-like\" (is \"double\")"
alike(1.1, 1L) # integers can match numerics
This feature is designed to simplify checks for integer-like numbers. The following two expressions are roughly equivalent:
stopifnot(length(x) == 1L && (is.integer(x) || is.numeric(x) && floor(x) == x)) stopifnot(alike(integer(1L), x))
Note that we only check numerics of length
<= 100 for integerness to avoid full scans on large vectors. We
expect that the primary source of these integer-like numerics is hand
input vectors (e.g.
c(1, 2, 3)), so hopefully this
compromise is not too limiting. You can modify the threshold length for
this treatment via the
fuzzy.int.max.len parameter to the
settings objects (see
Closures, builtins, and specials are all treated as a single type, even though internally they are stored as different types.
alike will recurse through lists (and by extension data
frames), pairlists, expressions, and environments and will check
pairwise alikeness between the corresponding elements of the
currentmay have additional items
currentmust be too (this is because the global environment is often littered with many objects, and explicitly comparing it to another environment could be computationally expensive)
NULL elements within templates in recursive objects are
considered undefined and as such act like wildcards:
## two NULLs match two length list alike(list(NULL, NULL), list(1:10, letters))
## but not three length list alike(list(NULL, NULL), list(1:10, letters, iris))
 "`length(list(1:10, letters, iris))` should be 2 (is 3)"
Note that top level
NULLs do not act as wildcards:
alike(NULL, 1:10) # NULL only matches NULL
 "`1:10` should be `NULL` (is \"integer\")"
NULL inconsistently depending on whether it is
nested or not is a compromise designed to make
better fit for argument validation because arguments that are
NULL by default are fairly common.
alike will check for self-referential loops in nested
environments and prevent infinite recursion. If you somehow introduce a
self-referential structure in a template without using environments then
alike will get stuck in an infinite recursion loop.
We are currently considering adding new comparison modes for lists that would allow for checks more similar to environments (see #29).
Alikeness for these types of objects is a little harder to define. We
have settled on somewhat arbitrary semantics, though hopefully they are
intuitive. These may change in the future as we gain experience using
alike with these types of objects. This is particularly
true of functions.
Language objects are also compared recursively, but alikeness has a slightly different meaning for them:
alike(quote(sum(a, b)), quote(sum(x, y))) # calls are consistent
alike(quote(sum(a, b)), quote(sum(x, x))) # calls are inconsistent
 "`quote(sum(x, x))[]` should not be `x`"
alike(quote(mean(a, b)), quote(sum(x, y))) # functions are different
 "`quote(sum(x, y))[]` should be a call to `mean` (is a call to `sum`)"
Since variables can contain anything we do not require them to match
directly across calls. In the examples above the second call fails
because the template defines different variables for each argument, but
current object uses the same variable twice. The third
call fails because the functions are different and as such the calls are
If a function is defined in the calling frame,
match.call it prior to testing alikeness:
<- function(a, b, c) NULL fun alike(quote(fun(p, q, p)), quote(fun(y, x, x)))
 "`quote(fun(y, x, x))[]` should be `y` (is `x`)"
# `match.call` re-orders arguments alike(quote(fun(p, q, p)), quote(fun(b=y, x, x)))
Constants match any constants, but keep in mind that expressions like
c(1, 2, 3) are calls to
c respectively, not constants in the context of
NULL is a wild card in calls as well:
str(one.arg.tpl <- as.call(list(NULL, NULL)))
alike(one.arg.tpl, quote(log(10, 10)))
 "`quote(log(10, 10))` should have 1 arguments (has 2)"
( are ignored when comparing calls since
parentheses are redundant in call trees because the tree structure
encodes operation precedence independent of operator precedence.
We concede that the rules for “alikeness” of language objects are arbitrary, but hope the outcomes of those rules is generally intuitive. Unfortunately value and structure are somewhat intertwined for language objects so we must impose our own view of what is value and what is structure.
Formulas are treated like calls, except that constants must match:
alike(y ~ x ^ 2, a ~ b ^ 2)
alike(y ~ x ^ 2, a ~ b ^ 3)
 "`(a ~ b^3)[][]` should have identical constant values"
alike if the signature of the
current function can reasonably be interpreted as a valid
method for the
alike(print, print.default) # print can be the generic for print.default
alike(print.default, print) # but not vice versa
 "`print` should have argument `digits` after argument `x`"
A method of a generic must have all arguments present in the generic,
with the same default values if those are defined. If the generic
... then the method may have additional arguments,
but must also contain
Potential changes / improvements for function comparison are being considered in #35.
S4 and RC objects are considered alike if
class(target). Since these objects embed
structural information in their definitions
alike relies on
class alone to establish alikeness.
Objects of the following types are actually references to specific memory locations:
These are typically attached as attributes to other objects that
contain the information required to establish alikeness
data.table, byte-compiled functions), so we only
check their type.
Much of the structure of an object is determined by attributes.
alike recursively compares object attributes and requires
them to be
alike, unless the attribute is a special attribute or an environment.
Environments within attributes in the template must be matched by an
environment, but nothing is checked about the environments to avoid
expensive computations on objects that commonly include environments in
their attributes (e.g. formulas); note this is different than the
treatment of environments as actual objects.
Only attributes present in the template object are checked:
alike(structure(logical(1L), a=integer(3L)), structure(TRUE, a=1:3, b=letters))
alike(structure(TRUE, a=1:3, b=letters), structure(logical(1L), a=integer(3L)))
 "`structure(logical(1L), a = integer(3L))` should have attribute \"b\""
Attributes present in
current but missing in
target may be anything at all.
The special attributes are
attributes are discussed in sections 2.2
and 2.3 of the R Language Definition, and have well defined and
consistently applied semantics in R. Since the semantics of these
attributes are well known, we are able to define “alikeness” for them in
a more granular way than we can for arbitrary attributes.
We also consider
srcref to be a special attribute. This
attribute is not checked.
If present in
target, then must be matched exactly by
the corresponding attribute in
current, except that:
character(0L)) will match any character
"") in a
row.namescharacter vector will allow any value to match at the corresponding position of the
alike(setNames(integer(), character()), 1:3)
 "`1:3` should have attribute \"names\""
alike(setNames(integer(), character()), c(a=1, b=2, c=3))
alike(setNames(integer(3), c("", "", "Z")), c(a=1, b=2, c=3))
 "`names(c(a = 1, b = 2, c = 3))` should be \"Z\" (is \"c\")"
alike(setNames(integer(3), c("", "", "Z")), c(a=1, b=2, Z=3))
dim attributes must be identical between
current, except that if a value of
dim vector is zero in
the corresponding value in
current can be any value. This
is how comparisons like the following succeed:
<- matrix(integer(), ncol=3) # partially specified matrix mx.tpl alike(mx.tpl, matrix(sample(1:12), nrow=4))
alike(mx.tpl, matrix(sample(1:12), nrow=3)) # wrong number of columns
 "`matrix(sample(1:12), nrow = 3)` should have 3 columns (has 4)"
str(mx.tpl) # notice 0 for 1st dimension
int[0 , 1:3]
Must also be identical, except that if the
dimnames list for a particular dimension is
NULL, then the corresponding
dimnames value in
current may be anything. As with
dimname element elements match any name.
<- matrix(integer(), ncol=3, dimnames=list(row.id=NULL, c("R", "G", ""))) mx.tpl <- matrix(sample(0:255, 12), ncol=3, dimnames=list(row.id=1:4, rgb=c("R", "G", "Blue"))) mx.cur <- matrix(sample(0:255, 12), ncol=3, dimnames=list(1:4, c("R", "G", "b"))) mx.cur2 alike(mx.tpl, mx.cur)
 "`dimnames(mx.cur2)` should have attribute \"names\""
dimnames can have a
names attributed is treated as described in
row.names and names.
 "row.id" ""
S3 objects are considered alike if the
inherits from the
target class. Note that “inheritance”
here is used in a stricter context than in the typical S3
targetmust be present in
currentmust be the same as the last class in
<- structure(TRUE, class=c("a", "b", "c")) tpl <- structure(TRUE, class=c("x", "a", "b", "c")) cur <- structure(TRUE, class=c("a", "b", "c", "x")) cur2 alike(tpl, cur)
 "`class(cur2)` should be \"a\" (is \"b\")"
tsp attribute of
ts objects behaves
similarly to the
dim attribute. Any
component (i.e. start, end, frequency) that is set to zero will act as a
wild card. Other components must be identical. It is illegal to set
tsp components to zero throught the standard R interface,
but you may use
abstract as a work-around.
Levels are compared like row.names and names.
This attribute is completely ignored.
If an object contains one of the special attributes, but the
attribute value is inconsistent with the standard definition of the
alike will silently treat that attribute as any
other normal attribute.
You can use the
settings parameter to
to modify comparison behavior. See
You can always create your own templates by manually building R structures:
<- integer(1L) int.scalar 2.by.4 <- matrix(integer(), 2, 4) int.mat.# A df without column names <- structure( df.chr.num.num list(character(), numeric(), numeric()), class="data.frame" )
Alternatively, you can start with a known structure, and abstract away the instance-specific details. For example, suppose we are sending sample collectors out on the field to record information about iris flowers:
<- iris[0, ] iris.tpl alike(iris.tpl, iris.sample.1) # make sure they submit data correctly
abstract is an S3 generic defined by
along with methods for common objects.
length of atomic vectors to zero:
abstract(list(c(a=1, b=2, c=3), letters))
[] named numeric(0) [] character(0)
and also abstracts the
tsp attributes if present. Other attributes are left
untouched unless a specific
abstract method exists for a
particular object that also modifies attributes. One example of such a
abstract.lm, and it does some minor tweaking to
the base abstractions to allow us to match models produced by
<- data.frame(x=runif(3), y=runif(3), z=runif(3)) df.dummy <- abstract(lm(y ~ x + z, df.dummy)) mdl.tpl # TRUE, expecting bi-variate model alike(mdl.tpl, lm(Sepal.Length ~ Sepal.Width + Petal.Width, iris))
alike(mdl.tpl, lm(Sepal.Length ~ Sepal.Width, iris))
 "`lm(Sepal.Length ~ Sepal.Width, iris)$terms[]` should be a call to `+` (is \"symbol\")"
The error message is telling us that at index
lm(Sepal.Length ~ Sepal.Width, iris)$terms)
alike was expecting a call to
+ instead of a
Sepal.Width + <somevar> instead of
Sepal.Width). The message could certainly be more eloquent,
but with a little context it should provide enough information to figure
out the problem.
We have gone to great lengths to make
alike fast so that
it can be included in other functions without concerns for what
<- function(a, b) type_and_len typeof(a) == typeof(b) && length(a) == length(b) # for reference bench_mark(times=1e4, identical(rivers, rivers), alike(rivers, rivers), type_and_len(rivers, rivers) )
Mean eval time from 10000 iterations, in microseconds: identical(rivers, rivers) ~ 0.7 alike(rivers, rivers) ~ 2.2 type_and_len(rivers, rivers) ~ 1.3
alike is slower than
the comparable bare bones R function, it is competitive with a bare
bones R function that checks types and length. As objects grow more
identical will obviously pull ahead, though
alike should be sufficiently fast for most
bench_mark(times=1e4, identical(mtcars, mtcars), alike(mtcars, mtcars) )
Mean eval time from 10000 iterations, in microseconds: identical(mtcars, mtcars) ~ 0.5 alike(mtcars, mtcars) ~ 12.6
In the above example, we are comparing the data frames, their attributes, and the 11 columns individually.
Keep in mind that the complexity of the
is driven by the complexity of the template, not the object we are
checking, so we can always manage the expense of the
Comparisons that succeed will be substantially faster than comparisons that fail as the construction of error messages is non-trivial and we have prioritized optimization in the success case.
Language object comparison is relatively slow. We intend to optimize this some day.
Templates with large numbers of attributes (e.g. > 25) may scale non-linearly. We intend to optimize this some day, though in our experience objects with that many attributes are rare (note having multiple objects each with a handful attributes nested in recursive structures is not a problem).
Large objects will be slower to evaluate. Let us revisit the
lm example, though this time we compare our template to
itself to ensure that the comparisons succeed for
<- abstract(lm(y ~ x + z, data.frame(x=runif(3), y=runif(3), z=runif(3)))) mdl.tpl # compare mdl.tpl to itself to ensure success in all three scenarios bench_mark( alike(mdl.tpl, mdl.tpl), all.equal(mdl.tpl, mdl.tpl), # for reference identical(mdl.tpl, mdl.tpl) )
Mean eval time from 1000 iterations, in microseconds: alike(mdl.tpl, mdl.tpl) ~ 123 all.equal(mdl.tpl, mdl.tpl) ~ 1586 identical(mdl.tpl, mdl.tpl) ~ 1
Even with template as large as
lm results (check
str(mdl.tpl)) we can evaluate
of times before the overhead becomes noticeable.
Some fairly innocuous R expressions carry substantial overhead. Consider:
<- data.frame(a=integer(), b=numeric()) df.tpl <- data.frame(a=1:10, b=1:10 + .1) df.cur bench_mark( alike(df.tpl, df.cur), alike(data.frame(integer(), numeric()), df.cur) )
Mean eval time from 1000 iterations, in microseconds: alike(df.tpl, df.cur) ~ 6 alike(data.frame(integer(), numeric()).. ~ 277
data.frame is a particularly slow constructor, but in
general you are best served by defining your templates (including calls
abstract) outside of your function so they are created
on package load rather than every time your function is called.
alikeas an S3 generic
alike is not currently an S3 generic, but will likely
one in the future provided we can create an implementation with and
acceptable performance profile.