In this package there are currently 2 functions that provide robust alternatives to the `t_TOST`

function.

The Wilcoxon group of tests (includes Mann-Whitney U-test) provide a non-parametric test of differences between groups, or within samples, based on ranks. This provides a test of location shift, which is a fancy way of saying differences in the center of the distribution (i.e., in parametric tests the location is mean). With TOST, there are two separate tests of directional location shift to determine if the location shift is within (equivalence) or outside (minimal effect). The exact calculations can be explored via the documentation of the `wilcox.test`

function.

In the TOSTER package, we accomplish this with the `wilcox_TOST`

function. Overall, this function operates extremely similar to the `t_TOST`

function. However, the standardized mean difference (SMD) is *not* calculated. Instead the rank-biserial correlation is calculated for *all* types of comparisons (e.g., two sample, one sample, and paired samples). Also, there is no plotting capability at this time for the output of this function.

As an example, we can use the sleep data to make a non-parametric comparison of equivalence.

```
data('sleep')
library(TOSTER)
= wilcox_TOST(formula = extra ~ group,
test1 data = sleep,
paired = FALSE,
low_eqbound = -.5,
high_eqbound = .5)
print(test1)
```

```
##
## Wilcoxon rank sum test with continuity correction
## Hypothesis Tested: Equivalence
## Equivalence Bounds (raw):-0.500 & 0.500
## Alpha Level:0.05
## The equivalence test was non-significant W = 20.000, p = 8.94e-01
## The null hypothesis test was non-significant W = 25.500, p = 6.93e-02
## NHST: don't reject null significance hypothesis that the effect is equal to zero
## TOST: don't reject null equivalence hypothesis
##
## TOST Results
## statistic p.value
## NHST 25.5 0.06932758
## TOST Lower 34.0 0.89385308
## TOST Upper 20.0 0.01287404
##
## Effect Sizes
## estimate lower.ci upper.ci conf.level
## Median of Differences -1.346388 -3.3999651 -0.09995341 0.9
## rank-biserial correlation -0.490000 -0.7492521 -0.10053222 0.9
```

The standardized effect size reported for the `wilcox_TOST`

procedure is the rank-biserial correlation. This is a fairly intuitive measure of effect size which has the same interpretation of the common language effect size (Kerby 2014). However, instead of assuming normality and equal variances, the rank-biserial correlation calculates the number of favorable (positive) and unfavorable (negative) pairs based on their respective ranks.

Overall, the correlation is calculated as the proportion of favorable pairs minus the unfavorable pairs.

\[ r_{biserial} = f_{pairs} - u_{pairs} \]

The Fisher approximation is used to calculate the confidence intervals.

For paired samples the standard error is calculated as the following:

\[ SE_r = \sqrt{ \frac {(2 \cdot nd^3 + 3 \cdot nd^2 + nd) / 6} {(nd^2 + nd) / 2} } \]

wherein, nd represents the total count not equal to zero.

For independent samples, the standard error is calculated as the following:

\[ SE_r = \sqrt{\frac {(n1 + n2 + 1)} { (3 \cdot n1 \cdot n2)}} \]

The confidence intervals can then be calculated by transforming the estimate.

\[ r_z = atanh(r_{biserial}) \]

Then the confidence interval can be calculated and back transformed.

\[ r_{CI} = tanh(r_z \pm Z_{(1 - \alpha / 2)} \cdot SE_r) \]

We hope a bootstrapped version of this confidence interval will be added in future updates.

The boostrap is a simulation based technique, derived from resampling with replacement, designed for statistical estimation and inference. Bootsrapping techniques are very useful because they are considered somewhat robust to the violations of assumptions for a simple t-test. Therefore we added a bootsrap option, `boot_t_TOST`

to the package to provide another robust alternative to the `t_TOST`

function.

In this function we provide a percentile bootstrap solution outlined by Efron and Tibshirani (1993) (see chapter 16, page 220). The bootstrapped p-values are derived from the “studentized” version of a test of mean differences outlined by Efron and Tibshirani (1993). Overall, the results should be similar to the results of `t_TOST`

. **However**, for paired samples, the Cohen’s d(rm) effect size *cannot* be calculated at this time.

Form B bootstrap data sets from x* and y* wherein x* is sampled with replacement from \(\tilde x_1,\tilde x_2, ... \tilde x_n\) and y* is sampled with replacement from \(\tilde y_1,\tilde y_2, ... \tilde y_n\)

t is then evaluated on each sample, but the mean of each sample (y or x) and the overall average (z) are subtracted from each

\[ t(z^{*b}) = \frac {(\bar x^*-\bar x - \bar z) - (\bar y^*-\bar y - \bar z)}{\sqrt {sd_y^*/n_y + sd_x^*/n_x}} \]

- An approximate p-value can then be calculated as the number of bootstrapped results greater than the observed t-statistic from the sample.

\[ p_{boot} = \frac {\#t(z^{*b}) \ge t_{sample}}{B} \]

The same process is completed for the one sample case but with the one sample solution for the equation outlined by \(t(z^{*b})\). The paired sample case in this bootstrap procedure is equivalent to the one sample solution because the test is based on the difference scores.

Again, we can use the sleep data to see the bootstrapped results. Notice that the plots show how the resampling via boostrapping indicates the instability of Hedges’ d(z).

```
data('sleep')
= boot_t_TOST(formula = extra ~ group,
test1 data = sleep,
paired = TRUE,
low_eqbound = -.5,
high_eqbound = .5,
R = 999)
print(test1)
```

```
##
## Bootstrapped Paired t-test
## Hypothesis Tested: Equivalence
## Equivalence Bounds (raw):-0.500 & 0.500
## Alpha Level:0.05
## The equivalence test was non-significant, t(9) = -2.777, p = 1e+00
## The null hypothesis test was significant, t(9) = -4.062, p = 0e+00
## NHST: reject null significance hypothesis that the effect is equal to zero
## TOST: don't reject null equivalence hypothesis
##
## TOST Results
## t SE df p.value
## t-test -4.062128 0.3889587 9 0
## TOST Lower -2.776644 0.3889587 9 1
## TOST Upper -5.347611 0.3889587 9 0
##
## Effect Sizes
## estimate SE lower.ci upper.ci conf.level
## Raw -1.580000 0.3600943 -2.250000 -1.0600000 0.9
## Hedges' g(z) -1.230152 0.7235322 -2.815123 -0.9120299 0.9
##
## Note: percentile boostrap method utilized.
```

`plot(test1)`

Efron, Bradley, and Robert J. Tibshirani. 1993. *An Introduction to the Bootstrap*. Monographs on Statistics and Applied Probability 57. Boca Raton, Florida, USA: Chapman & Hall/CRC.

Kerby, Dave S. 2014. “The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation.” *Comprehensive Psychology* 3 (January): 11.IT.3.1. https://doi.org/10.2466/11.it.3.1.