Welcome to the ‘Get started’ page of the
package. In this vignette you are able to find detailed examples of how
you can incorporate the functions provided by the package.
The workhorse of the package is the
function. This function takes a vector of numbers and returns the
requested digits (with or without including
<- c(0.00, 0.20, 1.23, 40.00, 54.04) x extract_digits(x, check = 'first', include.zero = FALSE)
##  NA 2 1 4 5
distr.btest() take a vector of numeric values, extract the
requested digits, and compares the frequencies of these digits to a
reference distribution. The function
a frequentist hypothesis test of the null hypothesis that the digits are
distributed according to the reference distribution and produces a
p value. The function
distr.btest() performs a
Bayesian hypothesis test of the null hypothesis that the digits are
distributed according to the reference distribution against the
alternative hypothesis (using the prior parameters specified in
alpha) that the digits are not distributed according to the
reference distribution and produces a Bayes factor (Kass & Raftery,
1995). The possible options for the
check argument are
taken over from
Benford’s law (Benford, 1938) is a principle that describes a pattern
in many naturally-occurring numbers. According to Benford’s law, each
possible leading digit d in a naturally occurring, or
non-manipulated, set of numbers occurs with a probability p(d) = log10(1
+ 1/d). The distribution of leading digits in a data set of financial
transaction values (e.g., the
sinoForest data) can be
extracted and tested against the expected frequencies under Benford’s
law using the code below.
# Frequentist hypothesis test distr.test(sinoForest$value, check = 'first', reference = 'benford')
## ## Digit distribution test ## ## data: sinoForest$value ## n = 772, X-squared = 7.6517, df = 8, p-value = 0.4682 ## alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.
# Bayesian hypothesis test using default prior distr.btest(sinoForest$value, check = 'first', reference = 'benford', BF10 = FALSE)
## ## Digit distribution test ## ## data: sinoForest$value ## n = 772, BF01 = 6899678 ## alternative hypothesis: leading digit(s) are not distributed according to the benford distribution.
rv.test() analyzes the frequency with which
values get repeated within a set of numbers. Unlike Benford’s law, and
its generalizations, this approach examines the entire number at once,
not only the first or last digit. For the technical details of this
procedure, see Simohnsohn (2019). The possible options for the
check argument are taken over from
In this example we analyze a data set from a (retracted) paper that
describes three experiments run in Chinese factories, where workers were
nudged to use more hand-sanitizer. These data were shown to exhibited
two classic markers of data tampering: impossibly similar means and the
uneven distribution of last digits (Yu, Nelson, & Simohnson, 2018).
We can use the
rv.test() function to test if these data
also contain a greater amount of repeated values than expected if the
data were not tampered with.
rv.test(sanitizer$value, check = 'lasttwo', B = 2000)
## ## Repeated values test ## ## data: sanitizer$value ## n = 1600, AF = 1.5225, p-value = 0.0025 ## alternative hypothesis: average frequency in data is greater than for random data.