- Fix a potential
`runtime error: null pointer passed as argument 1, which is declared to never be null`

bug introduced in v1.4.0 that was detected by the UndefinedBehaviorSanitizer (UBSan) running on CRAN.

`rowSums2()`

is now significantly faster for larger matrices.

None of the error messages use a trailing period.

Addressing changes in the C API of R-devel resulted in compiler errors such as

`error: implicit declaration of function 'Calloc'; did you mean 'calloc'? [-Wimplicit-function-declaration]`

.Addressing changes in stricter compiler flags of R-devel resulted in compiler warning

`embedding a directive within macro arguments has undefined behavior [-Wembedded-directive]`

.

- Calling
`colRanks()`

and`rowRanks()`

without explicitly specifying argument`ties.method`

is deprecated since version 1.3.0. If not explicitly specified, a deprecation warning is now produced every 25:th call not specifying the`ties.method`

argument.

`validateIndices()`

has been removed. It had been defunct since version 0.63.0 (2022-11-14).

- Fixed two PROTECT/UNPROTECT issues detected by the ‘rchk’ tool.

Calling

`colRanks()`

and`rowRanks()`

without explicitly specifying argument`ties.method`

will be deprecated when using R (>= 4.4.0). The reason is that the current default is`ties.method = "max"`

, but we want to change that to`ties.method = "average"`

to align it with`base::rank()`

. In order to minimize the risk for sudden changes in results, we ask everyone to explicitly specify their intent. The first notice will be through deprecation warnings, which will only occur every 50:th call to keep the noise level down. We will make it more noisy in future releases, and eventually also escalated to defunct errors.Using a scalar value for argument

`center`

of`colSds()`

,`rowSds()`

,`colVars()`

,`rowVars()`

,`colMads()`

,`rowMads()`

,`colWeightedMads()`

, and`rowWeightedMads()`

is now defunct.

- Error messages that report on large integers (> 2^31 - 1), would not render those integers correctly.

`useNames = NA`

is defunct.

`useNames = NA`

is defunct in R (>= 4.4.0). Remains deprecated in R (< 4.4.0) for now.

- The deprecation warning for using
`useNames = NA`

, suggested using`useNames = TRUE`

twice instead of also`useNames = FALSE`

.

`useNames = TRUE`

is the new default for all functions. For backward compatibility, it used to be`useNames = NA`

.`colQuantiles()`

and`rowQuantiles()`

gained argument`digits`

, just like`stats::quantile()`

gained that argument in R 4.1.0.`colQuantiles()`

and`rowQuantiles()`

only sets quantile percentage names when`useNames = TRUE`

, to align with how argument`names`

of`stats::quantile()`

works in base R.

`colMeans2()`

and`rowMeans2()`

gained argument`refine`

. If`refine = TRUE`

, then the sample average for numeric matrices are calculated using a two-pass scan, resulting in higher precision. The default is`refine = TRUE`

to align it with`colMeans()`

, but also`mean2()`

in this package. If the higher precision is not needed, using`refine = FALSE`

will be almost twice as fast.`colSds()`

,`rowSds()`

,`colVars()`

, and`rowVars()`

gained argument`refine`

. If`refine = TRUE`

, then the sample average for numeric matrices are calculated using a two-pass scan, resulting in higher precision for the estimate of the center and therefore also the variance.

Unnecessary checks for missing indices are eliminated, yielding better performance. This change does not affect user-facing API.

Made

`colQuantiles()`

and`rowQuantiles()`

a bit faster for`type != 7L`

, by making sure percentage names are only generated once, instead of once per column or row.

Contrary to other functions in the package, and how it works in base R, functions

`colCumsums()`

,`colCumprods()`

,`colCummins()`

,`colCummaxs()`

,`colRanges()`

,`colRanks()`

, and`colDiffs()`

, plus the corresponding row-based versions, did not drop the`names`

attribute when both row and column names were`NULL`

. Now also these functions behaves the same as the case when neither row or column names are set.`colQuantiles()`

and`rowQuantiles()`

did not generate quantile percentage names exactly the same way as`stats::quantile()`

, which would reveal itself for certain combinations of`probs`

and`digits`

.

`useNames = NA`

is now deprecated. Use`useNames = TRUE`

or`useNames = FALSE`

instead.

Package compiles again with older compilers not supporting the C99 standard (e.g. GCC 4.8.5 (2015), which is the default on RHEL / CentOS 7.9). This was the case also for matrixStats (<= 0.54.0).

Added more information to the error message produced when argument

`center`

for`col-`

and`rowVars()`

holds an invalid value.Fix two compilation warnings on

`a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]`

.

`validateIndices()`

is now defunct and will eventually be removed from the package API.

`colCummins()`

,`colCummaxs()`

,`rowCummins()`

, and`rowCummaxs()`

now support also logical input.

- Updated native code to use the C99 constant
`DBL_MAX`

instead of legacy S constant`DOUBLE_XMAX`

, which is planned to be unsupported in R (>= 4.2.0).

- When argument
`which`

for`colOrderStats()`

and`rowOrderStats()`

is out of range, the error message now reports on the value of`which`

. Similarly, when argument`probs`

for`colQuantiles()`

and`rowQuantiles()`

is out of range, the error message reports on its value too.

`validateIndices()`

is deprecated and will eventually be removed from the package API.

- The package test for benchmark reports failed because the
**markdown**package was not declared as a suggested package.

Handling of the

`useNames`

argument is now done in the native code.Passing

`idxs`

,`rows`

, and`cols`

arguments of type integer is now less efficient than it used to, because the new code re-design (see below) requires an internal allocation of an equally long`R_xlen_t`

vector that is populated by indices coerced from`R_len_t`

to`R_xlen_t`

integers.

- No longer using native-code implementations that are specific to the
type of index that is passed for subsetting of vectors, rows, and
columns. This was done to avoid the complex use of macros that was
cumbersome to maintain and added an extra threshold for new contributors
to overcome. Another advantage is that faster compilation time when
built from source and a smaller size of compiled library. In previous
version
`R CMD check`

would produce a NOTE on the package installation size being large, which no longer is the case. The downside is that extra overhead when passing integer indices (see above comment).

- Contrary to other functions which gained new argument
`useNames = NA`

in the previous release,`colQuantiles()`

and`rowQuantiles()`

got`useNames = TRUE`

.

- Add row and column names support to all row and column functions. To
return row and column names, set argument
`useNames = TRUE`

. To drop them, set`useNames = FALSE`

. To preserve the current, inconsistent behavior, set`useNames = NA`

, which, for backward compatibility reasons, remains the default for now.

- Harmonized error messages.

- Some of the examples and package tests would allocated matrices with dimensions that did not match the number of elements in the input data.

- Dropped
`meanOver()`

and`sumOver()`

, and argument`method`

from`weightedVar()`

, that have been defunct since January 2018.

`colVars()`

and`rowVars()`

with argument`center`

now calculates the sample variance using the`n/(n-1)*avg((x-center)^2)`

formula rather than the`n/(n-1)*(avg(x^2)-center^2)`

formula that was used in the past. Both give the same result when`center`

is the correct sample mean estimate. The main reason for this change is that, if an incorrect`center`

is provided, in contrast to the old approach, the new approach is guaranteed to give at least non-negative results, despite being incorrect. BACKWARD COMPATIBILITY: Out of all 314 reverse dependencies on CRAN and Bioconductor, only four called these functions with argument`center`

. All of them pass their package checks also after this update. To further protect against a negative impact in existing user scripts,`colVars()`

and`rowVars()`

will calculate both versions and assert that the result is the same. If not, an informative error is produced. To limit the performance impact, this validation is run only once every 50:th call, a frequency that can be controlled by R option`matrixStats.vars.formula.freq`

. Setting it to 0 or NULL will disable the validation. The default can also be controlled by environment variable`R_MATRIXSTATS_VARS_FORMULA_FREQ`

. This validation framework will be removed in a future version of the package after it has been established that this change has no negative impact.

Now

`colWeightedMads()`

and`rowWeightedMads()`

accept`center`

of the same length as the number of columns and rows, respectively.`colAvgsPerRowSet()`

and`rowAvgsPerRowSet()`

gained argument`na.rm`

.Now

`weightedMean()`

and`weightedMedian()`

and the corresponding row- and column-based functions accept logical`x`

, where FALSE is treated as integer 0 and TRUE as 1.Now

`x_OP_y()`

and`t_tx_OP_y()`

accept logical`x`

and`y`

, where FALSE is treated as integer 0 and TRUE as 1.

`colQuantiles()`

and`rowQuantiles()`

on a logical matrix should return a numeric vector for`type = 7`

. However, when there were only missing values (= NA) in the matrix, then it would return a “logical” vector instead.`colAvgsPerRowSet()`

on a single-column matrix would produce an error on non-matching dimensions. Analogously, for`rowAvgsPerRowSet()`

and single- row matrices.`colVars(x)`

and`rowVars(x)`

with`x`

being an array would give the wrong value if both argument`dim.`

and`center`

would be specified.The documentation was unclear on what the

`center`

argument should be. They would not detect when an incorrect specification was used, notably when the length of`center`

did not match the matrix dimensions. Now these functions give an informative error message when`center`

is of the incorrect length.

- Using a scalar value for argument
`center`

of`colSds()`

,`rowSds()`

,`colVars()`

,`rowVars()`

,`colMads()`

,`rowMads()`

,`colWeightedMads()`

, and`rowWeightedMads()`

is now deprecated.

`colCumprods()`

and`rowCumprods()`

now support also logical input. Thanks to Constantin Ahlmann-Eltze at EMBL Heidelberg for the patch.

`colCollapse()`

and`rowCollapse()`

did not expand`idxs`

argument before subsetting by`cols`

and`rows`

, respectively. Thanks to Constantin Ahlmann-Eltze for reporting on this.`colAnys()`

,`rowAnys()`

,`anyValue()`

,`colAlls()`

,`rowAlls()`

, and`allValue()`

with`value=FALSE`

and*numeric*input would incorrectly consider all values different from one as FALSE. Now it is only values that are zero that are considered FALSE. Thanks to Constantin Ahlmann-Eltze for the bug fix.

`colQuantiles()`

and`rowQuantiles()`

now supports only integer, numeric and logical input. Previously, it was also possible to pass, for instance,`character`

input, but that was a mistake. The restriction on input allows for further optimization of these functions.The returned type of

`colQuantiles()`

and`rowQuantiles()`

is now the same as for`stats::quantile()`

, which depends on argument`type`

.

`colQuantiles()`

and`rowQuantiles()`

with the default`type = 7L`

and when there are no missing values are now significantly faster and use significantly fewer memory allocations.

`colDiffs()`

and`rowDiffs()`

gave an error if argument`dim.`

was of type numeric rather than type integer.`varDiff()`

,`sdDiff()`

,`madDiff()`

,`iqrDiff()`

, and the corresponding row- and column functions silently treated a`diff`

less than zero as`diff = 0`

. Now an error is produced.Error messages on argument

`dim.`

referred to non-existing argument`dim`

.Error messages on negative values in argument

`dim.`

reported a garbage value instead of the negative value.The Markdown reports produced by the internal benchmark report generator did not add a line between tables and the following text (a figure caption) causing the following text to be included in a cell on an extra row in the table (at least when rendered on GitHub Wiki pages).

`weightedVar()`

,`weightedSd()`

,`weightedMad()`

, and their row- and column- specific counter parts now return a missing value if there are missing values in any of the weights`w`

after possibly dropping (`x`

,`w`

) elements with missing values in`x`

(`na.rm = TRUE`

). Previously,`na.rm = TRUE`

would also drop (`x`

,`w`

) elements where`w`

was missing. With this change, we now have that for all functions in this package,`na.rm = TRUE`

never applies to weights - only`x`

values.

`colRanks()`

and`rowRanks()`

now supports the same set of`ties.method`

as`base::rank()`

plus`"dense"`

as defined by`data.table::frank()`

. For backward compatible reasons, the default`ties.method`

remains the same as in previous versions. Thank to Brian Montgomery for contributing this.`colCumsums()`

and`rowCumsums()`

now support also logical input.

`weightedVar()`

,`weightedSd()`

,`weightedMad()`

, and their row- and column- specific counter parts would produce an error instead of returning a missing value when one of the weights is a missing value.

- Calling
`indexByRow(x)`

where`x`

is a matrix is now defunct. Use`indexByRow(dim(x))`

instead.

- SPEEDUP: No longer using
`stopifnot()`

for internal validation, because it comes with a great overhead. This was only used in`weightedMad()`

,`col-`

, and`rowWeightedMads()`

, as well as`col-`

and`rowAvgsPerColSet()`

.

Despite being an unlikely use case,

`colLogSumExps(lx)`

/`rowLogSumExps(lx)`

now also accepts integer`lx`

values.The error produced when using

`indexByRow(dim)`

with`prod(dim) >= 2^31`

would report garbage dimensions instead of`dim`

.

- Calling
`indexByRow(x)`

, where`x`

is a matrix, is deprecated. Use`indexByRow(dim(x))`

instead.

- Now
`col-`

/`rowSds()`

explicitly replicate all arguments that are passed to`col-`

/`rowVars()`

.

- Added details on how
`weightedMedian(x, interpolate = TRUE)`

works.

`colLogSumExps(lx, cols)`

/`rowLogSumExps(lx, rows)`

gave an error if`lx`

has rownames / colnames.`col-`

/`rowQuantiles()`

would lose rownames of output in certain cases.

Functions

`sum2(x)`

and`means2(x)`

now accept also logical input`x`

, which corresponds to using`as.integer(x)`

but without the need for neither coercion nor internal extra copies. With`sum2(x, mode = "double")`

it is possible to count number of TRUE elements beyond 2^31-1, which`base::sum()`

does not support.Functions

`col-`

/`rowSums2()`

and`col-`

/`rowMeans2()`

now accept also logical input`x`

.Function

`binMeans(y, x, bx)`

now accepts logical`y`

, which corresponds to to using`as.integer(y)`

, but without the need for coercion to integer.Functions

`col-`

/`rowTabulates(x)`

now support logical input`x`

.Now

`count()`

can count beyond 2^31-1.`allocVector()`

can now allocate long vectors (longer than 2^31-1).Now

`sum2(x, mode = "integer")`

generates a warning if`typeof(x) == "double"`

asking if`as.integer(sum2(x))`

was intended.Inspired by

`Hmisc::wtd.var()`

, when`sum(w) <= 1`

,`weightedVar(x, w)`

now produces an informative warning that the estimate is invalid.

- Harmonized the ordering of the arguments of
`colAvgsPerColSet()`

with that of`rowAvgsPerColSet()`

.

`col-`

/`rowLogSumExp()`

could core dump R for “large” number of columns/rows. Thanks Brandon Stewart at Princeton University for reporting on this.`count()`

beyond 2^31-1 would return invalid results.Functions

`col-`

/`rowTabulates(x)`

did not count missing values.`indexByRow(dim, idxs)`

would give nonsense results if`idxs`

had indices greater than`prod(dim)`

or non-positive indices; now it gives an error.`indexByRow(dim)`

would give nonsense results when`prod(dim) >= 2^31`

; now it gives an informative error.`col-`

/`rowAvgsPerColSet()`

would return vector rather than matrix if`nrow(X) <= 1`

. Thanks to Peter Hickey (Johns Hopkins University) for troubleshooting and providing a fix.

Previously deprecated

`meanOver()`

and`sumOver()`

are defunct. Use`mean2()`

and`sum2()`

instead.Previously deprecated

`weightedVar(x, w, method = "0.14.2")`

is defunct.Dropped previously defunct

`weightedMedian(..., ties = "both")`

.Dropped previously defunct argument

`centers`

for`col-`

/`rowMads()`

. Use`center`

instead.Dropped previously defunct argument

`flavor`

of`colRanks()`

and`rowRanks()`

.

- Several of the row- and column-based functions would core dump R if the matrix was of a data type other than logical, integer, or numeric, e.g. character or complex. This is now detected and an informative error is produced instead. Similarly, some vector-based functions could potentially core dump R or silently return a nonsense result. Thank you Hervé Pagès, Bioconductor Core, for the report.

`rowVars(..., method = "0.14.2")`

that was added for very unlikely needs of backward compatibility of an invalid degree-of-freedom term is deprecated.

- The package test on
`matrixStats:::benchmark()`

tried to run even if not all suggested packages were available.

Since

`anyNA()`

is a built-in function since R (>= 3.1.0), please use that instead of`anyMissing()`

part of this package. The latter will eventually be deprecated. For consistency with the`anyNA()`

name,`colAnyNAs()`

and`rowAnyNAs()`

are now also available replacing the identically`colAnyMissings()`

and`rowAnyMissings()`

functions, which will also be deprecated in a future release.`meanOver()`

was renamed to`mean2()`

and`sumOver()`

was renamed to`sum2()`

.

Added

`colSums2()`

and`rowSums2()`

which work like`colSums()`

and`rowSums()`

of the**base**package but also supports efficient subsetting via optional arguments`rows`

and`cols`

.Added

`colMeans2()`

and`rowMeans2()`

which work like`colMeans()`

and`rowMeans()`

of the**base**package but also supports efficient subsetting via optional arguments`rows`

and`cols`

.Functions

`colDiffs()`

and`rowDiffs()`

gained argument`dim.`

.Functions

`colWeightedMads()`

and`rowWeightedMads()`

gained arguments`constant`

and`center`

. The current implementation only support scalars for these arguments, which means that the same values are applied to all columns and rows, respectively. In previous version a hard-to-understand error would be produced if`center`

was of length greater than one; now an more informative error message is given.Package is now silent when loaded; it no longer displays a startup message.

Continuous-integration testing is now also done on macOS, in addition to Linux and Windows.

ROBUSTNESS: Package now registers the native API using also

`R_useDynamicSymbols()`

.

Cleaned up native low-level API and renamed native source code files to make it easier to navigate the native API.

Now using

**roxygen2**for help and NAMESPACE (was`R.oo::Rdoc`

).

`rowAnys(x)`

on numeric matrices`x`

would return`rowAnys(x == 1)`

and not`rowAnys(x != 0)`

. Same for`colAnys()`

,`rowAlls()`

, and`colAlls()`

. Thanks Richard Cotton for reporting on this.`sumOver(x)`

and`meanOver(x)`

would incorrectly return -Inf or +Inf if the intermediate sum would have that value, even if one of the following elements would turn the intermediate sum into NaN or NA, e.g. with`x`

as`c(-Inf, NaN)`

,`c(-Inf, +Inf)`

, or`c(+Inf, NA)`

.WORKAROUND: Benchmark reports generated by

`matrixStats:::benchmark()`

would use any custom R prompt that is currently set in the R session, which may not render very well. Now it forces the prompt to be the built-in`"> "`

one.

The package API is only intended for matrices and vectors of type numeric, integer and logical. However, a few functions would still return if called with a data.frame. This was never intended to work and is now an error. Specifically, functions

`colAlls()`

,`colAnys()`

,`colProds()`

,`colQuantiles()`

,`colIQRs()`

,`colWeightedMeans()`

,`colWeightedMedians()`

, and`colCollapse()`

now produce warnings if called with a data.frame. Same for the corresponding row- functions. The use of a `data.frame will be produce an error in future releases.`meanOver()`

and`sumOver()`

are deprecated because they were renamed to`mean2()`

and`sum2()`

, respectively.Previously deprecated (and ignored) argument

`flavor`

of`colRanks()`

and`rowRanks()`

is now defunct.Previously deprecated support for passing non-vector, non-matrix objects to

`rowAlls()`

,`rowAnys()`

,`rowCollapse()`

, and the corresponding column-based versions are now defunct. Likewise,`rowProds()`

,`rowQuantiles()`

,`rowWeightedMeans()`

,`rowWeightedMedians()`

, and the corresponding column-based versions are also defunct. The rationale for this is to tighten up the identity of the**matrixStats**package and what types of input it accepts. This will also help optimize the code further.

SPEEDUP / CLEANUP:

`rowMedians()`

and`colMedians()`

are now plain functions. They were previously S4 methods (due to a Bioconductor legacy). The package no longer imports the**methods**package.SPEEDUP: Now native API is formally registered allowing for faster lookup of routines from R.

Package now installs on R (>= 2.12.0) as claimed. Thanks to Mikko Korpela at Aalto University School of Science, Finland, for troubleshooting and providing a fix.

`logSumExp(c(-Inf, -Inf, ...))`

would return NaN rather than`-Inf`

. Thanks to Jason Xu (University of Washington) for reporting and Brennan Vincent for troubleshooting and contributing a fix.

- The Undefined Behavior Sanitizer (UBsan) reported on a
`memcall(src, dest, 0)`

call when`dest == null`

. Thanks to Brian Ripley and the CRAN check tools for catching this. We could reproduce this with gcc 5.1.1 but not with gcc 4.9.2.

- MAJOR FEATURE UPDATE: Subsetting arguments
`idxs`

,`rows`

and`cols`

were added to all functions such that the calculations are performed on the requested subset while avoiding creating a subsetted copy, i.e.`rowVars(x, cols = 4:6)`

is a much faster and more memory efficient version than`rowVars(x[, 4:6])`

and even yet more efficient than`apply(x, MARGIN = 1L, FUN = var)`

. These features were added by Dongcan Jiang, Peking University, with support from the Google Summer of Code program. A great thank you to Dongcan and to Google for making this possible.

- CONSISTENCY: Now all weight arguments (
`w`

and`W`

) default to NULL, which corresponds to uniform weights.

- ROBUSTNESS: Importing
**stats**functions in namespace.

`weightedVar(x, w)`

used the wrong bias correction factor resulting in an estimate that was tau too large, where`tau = ((sum(w) - 1) / sum(w)) / ((length(w) - 1) / length(w))`

. Thanks to Wolfgang Abele for reporting and troubleshooting on this.`weightedVar(x)`

with`length(x) = 1`

returned 0 - not NA. Same for`weightedSd()`

.`weightedMedian(x, w = NA_real_)`

returned`x`

rather than`NA_real_`

. This only happened for`length(w) = 1`

.`allocArray(dim)`

failed for`prod(dim) >= .Machine$integer.max`

.

CLEANUP: Defunct argument

`centers`

for`col-`

/`rowMads()`

; use`center`

.`weightedVar(x, w, method = "0.14.2")`

is deprecated.

`x_OP_y()`

and`t_tx_OP_y()`

would return garbage on Solaris SPARC (and possibly other architectures as well) when input was integer and had missing values.

`product(x, na.rm = FALSE)`

for integer`x`

with both zeros and NAs returned zero rather than NA.`weightedMean(x, w, na.rm = TRUE)`

did not handle missing values in`x`

properly, if it was an integer. It would also return NaN if there were weights`w`

with missing values, whereas`stats::weighted.mean()`

would skip such data points. Now`weightedMean()`

does the same.`(col|row)WeightedMedians()`

did not handle infinite weights as`weightedMedian()`

does.`x_OP_y(x, y, OP, na.rm = FALSE)`

returned garbage iff`x`

or`y`

had missing values of type integer.`rowQuantiles()`

and`rowIQRs()`

did not work for single-row matrices. Analogously for the corresponding column functions.`rowCumsums()`

,`rowCumprods()`

`rowCummins()`

, and`rowCummaxs()`

, accessed out-of-bound elements for Nx0 matrices where N > 0. The corresponding column methods has similar memory errors for 0xK matrices where K > 0.`anyMissing(list(NULL))`

returned NULL; now FALSE.`rowCounts()`

resulted in garbage if a previous column had NAs (because it forgot to update index kk in such cases).`rowCumprods(x)`

handled missing values and zeros incorrectly for integer`x`

(not double); a zero would trump an existing missing value causing the following cumulative products to become zero. It was only a zero that trumped NAs; any other integer would work as expected. Note, this bug was not in`colCumprods()`

.`rowAnys(x, value, na.rm = FALSE)`

did not handle missing values in a numeric`x`

properly. Similarly, for non-numeric and non-logical`x`

, row- and`colAnys()`

, row- and`colAlls()`

,`anyValue()`

and`allValue()`

did not handle when`value`

was a missing value.All of the above bugs were identified and fixed by Dongcan Jiang (Peking University, China), who also added corresponding unit tests.

- CLEANUP:
`anyMissing()`

is no longer an S4 generic. This was done as part of the migration of making all functions of**matrixStats**plain R functions, which minimizes calling overhead and it will also allow us to drop**methods**from the package dependencies. I’ve scanned all CRAN and Bioconductor packages depending on**matrixStats**and none of them relied on`anyMissing()`

dispatching on class, so hopefully this move has little impact. The only remaining S4 methods are now`colMedians()`

and`rowMedians()`

.

CONSISTENCY: Renamed argument

`centers`

of`col-`

/`rowMads()`

to`center`

. This is consistent with`col-`

/`rowVars()`

.CONSISTENCY:

`col-`

/`rowVars()`

now use`na.rm = FALSE`

as the default (`na.rm = TRUE`

was mistakenly introduced as the default in v0.9.7).

SPEEDUP: The check for user interrupts at the C level is now done less frequently of the functions. It does every k:th iteration, where

`k = 2^20`

, which is tested for using (`iter % k == 0`

). It turns out, at least with the default compiler optimization settings that I use, that this test is 3 times faster if`k = 2^n`

where n is an integer. The following functions checks for user interrupts:`logSumExp()`

,`(col|row)LogSumExps()`

,`(col|row)Medians()`

,`(col|row)Mads()`

,`(col|row)Vars()`

, and`(col|row)Cum(Min|Max|prod|sum)s()`

.SPEEDUP:

`logSumExp(x)`

is now faster if`x`

does not contain any missing values. It is also faster if all values are missing or the maximum value is +Inf - in both cases it can skip the actual summation step.

- ROBUSTNESS/TESTS: Package tests cover 96% of the code (was 91%).

- CLEANUP: Package no longer depends on
**R.methodsS3**.

`all()`

and`any()`

flavored methods on non-numeric and non-logical (e.g. character) vectors and matrices with`na.rm = FALSE`

did not give results consistent with`all()`

and`any()`

if there were missing values. For example, with`x <- c("a", NA, "b")`

we have`all(x == "a") == FALSE`

and`any(x == "a") == TRUE`

, whereas our corresponding methods would return NA in those cases. The methods fixed are`allValue()`

,`anyValue()`

,`col-`

/`rowAlls()`

, and`col-`

/`rowAnys()`

. Added more package tests to cover these cases.`logSumExp(x, na.rm = TRUE)`

would return NA if all values were NA and`length(x) > 1`

. Now it returns -Inf for all`length(x)`

:s.

`diff2()`

with`differences >= 3`

would*read*spurious values beyond the allocated memory. This error, introduced in 0.13.0, was harmless in the sense that the returned value was unaffected and still correct. Thanks to Brian Ripley and the CRAN check tools for catching this. I could reproduce it locally with valgrind.

- SPEEDUP/CLEANUP: Turned several S3 and S4 methods into plain R
functions, which decreases the overhead of calling the functions. After
this there are no longer any S3 methods. Remaining S4 methods are
`anyMissing()`

and`rowMedians()`

.

Added

`weightedMean()`

, which is ~10 times faster than`stats::weighted.mean()`

.Added

`count(x, value)`

which is a notably faster than`sum(x == value)`

. This can also be used to count missing values etc.Added

`allValue()`

and`anyValue()`

for`all(x == value)`

and`any(x == value)`

.Added

`diff2()`

, which is notably faster than`base::diff()`

for vectors, which it is designed for.Added

`iqrDiff()`

and`(col|row)IqrDiffs()`

.CONSISTENCY: Now

`rowQuantiles(x, na.rm = TRUE)`

returns all NAs for rows with missing values. Analogously for`colQuantiles()`

,`colIQRs()`

,`rowIQRs()`

and`iqr()`

. Previously, all these functions gave an error saying missing values are not allowed.COMPLETENESS: Added corresponding “missing” vector functions for already existing column and row functions. Similarly, added “missing” column and row functions for already existing vector functions, e.g. added

`iqr()`

and`count()`

to complement already existing`(col|row)IQRs()`

and`(col|row)Counts()`

functions.ROBUSTNESS: Now column and row methods give slightly more informative error messages if a data.frame is passed instead of a matrix.

- Added vignette summarizing available functions.

SPEEDUP:

`(col|row)Diffs()`

are now implemented in native code and notably faster than`diff()`

for matrices.SPEEDUP: Made

`binCounts()`

and`binMeans()`

a bit faster.SPEEDUP: Implemented

`weightedMedian()`

in native code, which made it ~3-10 times faster. Dropped support for`ties = "both"`

, because it would have to return two values in case of ties, which made the API unnecessarily complicated. If really needed, then call the function twice with`ties = "min"`

and`ties = "max"`

.SPEEDUP:

`(col|row)Anys()`

and`(col|row)Alls()`

is now notably faster compared to previous versions.

- CLEANUP: In the effort of migrating
`anyMissing()`

into a plain R function, the specific`anyMissing()`

implementations for data.frame:s and and list:s were dropped and is now handled by`anyMissing()`

for`"ANY"`

, which is the only S4 method remaining now. In a near future release, this remaining`"ANY"`

method will turned into a plain R function and the current S4 generic will be dropped. We know of no CRAN and Bioconductor packages that rely on it being a generic function. Note also that since R (>= 3.1.0) there is a`base::anyNA()`

function that does the exact same thing making`anyMissing()`

obsolete.

`weightedMedian(..., ties = "both")`

would give an error if there was a tie. Added package test for this case.

`weightedMedian(..., ties = "both")`

is now defunct.

- CODE FIX: The native code for
`product()`

on integer vector incorrectly used C-level`abs()`

on intermediate values despite those being doubles requiring`fabs()`

. Despite this, the calculated product would still be correct (at least when validated on several local setups as well as on the CRAN servers). Again, thanks to Brian Ripley for pointing out another invalid integer-double coercion at the C level.

`weightedMedian(..., interpolate = FALSE, ties = "both")`

is defunct.

- ROBUSTNESS: Updated package tests to check methods in more scenarios, especially with both integer and numeric input data.

`(col|row)Cumsums(x)`

where`x`

is integer would return garbage for columns (rows) containing missing values.`rowMads(x)`

where`x`

is numeric (not integer) would give incorrect results for rows that had an*odd*number of values (no ties). Analogously issues with`colMads()`

. Added package tests for such cases too. Thanks to Brian Ripley and the CRAN check tools for (yet again) catching another coding mistake. Details: This was because the C-level calculation of the absolute value of residuals toward the median would use integer-based`abs()`

rather than double-based`fabs()`

. Now it`fabs()`

is used when the values are double and`abs()`

when they are integers.

- Submitted to CRAN.

- Added
`(col|row)Cumsums()`

,`(col|row)Cumprods()`

,`(col|row)Cummins()`

, and`(col|row)Cummaxs()`

.

`(col|row)WeightedMeans()`

with all zero weights gave mean estimates with values 0 instead of NaN.

SPEEDUP: Implemented

`(col|row)Mads()`

,`(col|row)Sds()`

, and`(col|row)Vars()`

in native code.SPEEDUP: Made

`(col|row)Quantiles(x)`

faster for`x`

without missing values (and default`type = 7L`

quantiles). It should still be implemented in native code though.SPEEDUP: Made

`rowWeightedMeans()`

faster.

`(col|row)Medians(x)`

when`x`

is integer would give invalid median values in case (a) it was calculated as the mean of two values (“ties”), and (b) the sum of those values where greater than`.Machine$integer.max`

. Now such ties are calculated using floating point precision. Add lots of package tests.

SPEEDUP: Now

`(col|row)Mins()`

,`(col|row)Maxs()`

, and`(col|row)Ranges()`

are implemented in native code providing a significant speedup.SPEEDUP: Now

`colOrderStats()`

also is implemented in native code, which indirectly makes`colMins()`

,`colMaxs()`

and`colRanges()`

faster.SPEEDUP:

`colTabulates(x)`

no longer uses`rowTabulates(t(x))`

.SPEEDUP:

`colQuantiles(x)`

no longer uses`rowQuantiles(t(x))`

.

- CLEANUP: Argument
`flavor`

of`(col|row)Ranks()`

is now ignored.

`(col|row)Prods()`

now uses default`method = "direct"`

(was`"expSumLog"`

).

SPEEDUP: Now

`colCollapse(x)`

no longer utilizes`rowCollapse(t(x))`

. Added package tests for`(col|row)Collapse()`

.SPEEDUP: Now

`colDiffs(x)`

no longer uses`rowDiffs(t(x))`

. Added package tests for`(col|row)Diffs()`

.SPEEDUP: Package no longer utilizes

`match.arg()`

due to its overhead; methods`sumOver()`

,`(col|row)Prods()`

and`(col|row)Ranks()`

were updated.

- Added support for vector input to several of the row- and column
methods as long as the “intended” matrix dimension is specified via
argument
`dim`

. For instance,`rowCounts(x, dim = c(nrow, ncol))`

is the same as`rowCounts(matrix(x, nrow, ncol))`

, but more efficient since it avoids creating/allocating a temporary matrix.

- SPEEDUP: Now
`colCounts()`

is implemented in native code. Moreover,`(col|row)Counts()`

are now also implemented in native code for logical input (previously only for integer and double input). Added more package tests and benchmarks for these functions.

- Turned
`sdDiff()`

,`madDiff()`

,`varDiff()`

,`weightedSd()`

,`weightedVar()`

and`weightedMad()`

into plain functions (were generic functions).

- Removed unnecessary usage of
`::`

.

- SPEEDUP: Implemented
`indexByRow()`

in native code and it is no longer a generic function, but a regular function, which is also faster to call. The first argument of`indexByRow()`

has been changed to`dim`

such that one should use`indexByRow(dim(X))`

instead of`indexByRow(X)`

as in the past. The latter form is still supported, but deprecated.

- Added
`allocVector()`

,`allocMatrix()`

, and`allocArray()`

for faster allocation numeric vectors, matrices and arrays, particularly when filled with non-missing values.

- Calling
`indexByRow(X)`

with a matrix`X`

is deprecated. Instead call it with`indexByRow(dim(X))`

.

Better support for long vectors.

PRECISION: Using greater floating-point precision in more internal intermediate calculations, where possible.

- ROBUSTNESS: Although unlikely, with long vectors support for
`binCounts()`

and`binMeans()`

it is possible that a bin gets a higher count than what can be represented by an R integer (`.Machine$integer.max = 2^31-1`

). If that happens, an informative warning is generated and the bin count is set to`.Machine$integer.max`

. If this happens for`binMeans()`

, the corresponding mean is still properly calculated and valid.

- CLEANUP: Cleanup and harmonized the internal C API such there are
two well defined API levels. The high-level API is called by R via
`.Call()`

and takes care of most of the argument validation and construction of the return value. This function dispatch to functions in the low-level API based on data type(s) and other arguments. The low-level API is written to work with basic C data types only.

- Package incorrectly redefined
`R_xlen_t`

on R (>= 3.0.0) systems where`LONG_VECTOR_SUPPORT`

is not supported.

- Added
`sumOver()`

and`meanOver()`

, which are notably faster versions of`sum(x[idxs])`

and`mean(x[idxs])`

. Moreover, instead of having to do`sum(as.numeric(x))`

to avoid integer overflow when`x`

is an integer vector, one can do`sumOver(x, mode = "numeric")`

, which avoids the extra copy created when coercing to numeric (this numeric copy is also twice as large as the integer vector). Added package tests and benchmark reports for these functions.

SPEEDUP: Made

`anyMissing()`

,`logSumExp()`

,`(col|row)Medians()`

, and`(col|row)Counts()`

slightly faster by making the native code assign the results directly to the native vector instead of to the R vector, e.g.`ansp[i] = v`

where`ansp = REAL(ans)`

instead of`REAL(ans)[i] = v`

.Added benchmark reports for

`anyMissing()`

and`logSumExp()`

.

`binMeans()`

returned 0.0 instead of`NA_real_`

for empty bins.

- On some systems, the package failed to build on R (<= 2.15.3)
with compilation error:
`"redefinition of typedef 'R_xlen_t'"`

.

Added benchmark reports for also non-

**matrixStats**functions`col-`

/`rowSums()`

and`col-`

/`rowMeans()`

.Now all

`colNnn()`

and`rowNnn()`

methods are benchmarked in a combined report making it possible to also compare`colNnn(x)`

with`rowNnn(t(x))`

.

Relaxed some packages tests such that they assert numerical correctness via

`all.equal()`

rather than`identical()`

.Submitted to CRAN.

- The package tests for
`product()`

incorrectly assumed that the value of`prod(c(NaN, NA))`

is uniquely defined. However, as documented in`help("is.nan")`

, it may be NA or NaN depending on R system/platform.

Introduced a bug in v0.9.5 causing

`col-`

/`rowVars()`

and hence also`col-`

/`rowSds()`

to return garbage. Add package tests for these now.Submitted to CRAN.

- Added
`signTabulate()`

for tabulating the number of negatives, zeros, positives and missing values. For doubles, the number of negative and positive infinite values are also counted.

SPEEDUP: Now

`col-`

/`rowProds()`

utilizes new`product()`

function.SPEEDUP: Added

`product()`

for calculating the product of a numeric vector via the logarithm.

SPEEDUP: Made

`weightedMedian()`

a plain function (was an S3 method).CLEANUP: Now only exporting plain functions and generic functions.

SPEEDUP: Turned more S4 methods into S3 methods, e.g.

`rowCounts()`

,`rowAlls()`

,`rowAnys()`

,`rowTabulates()`

and`rowCollapse()`

.

- Added argument
`method`

to`col-`

/`rowProds()`

for controlling how the product is calculated.

SPEEDUP: Package is now byte compiled.

SPEEDUP: Made

`rowProds()`

and`rowTabulates()`

notably faster.SPEEDUP: Now

`rowCounts()`

,`rowAnys()`

,`rowAlls()`

and corresponding column methods can search for any value in addition to the default TRUE. The search for a matching integer or double value is done in native code, which is notably faster (and more memory efficient because it avoids creating any new objects).SPEEDUP: Made

`colVars()`

and`colSds()`

notably faster and`rowVars()`

and`rowSds()`

a slightly bit faster.Added benchmark reports, e.g.

`matrixStats:::benchmark("colMins")`

.

- SPEEDUP: Turned several S4 methods into S3 methods,
e.g.
`indexByRow()`

,`madDiff()`

,`sdDiff()`

and`varDiff()`

.

- Added argument
`trim`

to`madDiff()`

,`sdDiff()`

and`varDiff()`

.

- The native code of
`binMeans(x, bx)`

would try to access an out-of-bounds value of argument`y`

iff`x`

contained elements that are left of all bins in`bx`

. This bug had no impact on the results and since no assignment was done it should also not crash/core dump R. This was discovered thanks to new memtests (ASAN and valgrind) provided by CRAN.

`rowProds()`

would throw`"Error in rowSums(isNeg) :`

x`must be an array of at least two dimensions"`

on matrices where all rows contained at least one zero. Thanks to Roel Verbelen at KU Leuven for the report.

- Added
`weighedVar()`

and`weightedSd()`

.

MEMORY: Updated all functions to do a better job of cleaning out temporarily allocated objects as soon as possible such that the garbage collector can remove them sooner, iff wanted. This increase the chance for a smaller memory footprint.

Submitted to CRAN.

- Added argument
`right`

to`binCounts()`

and`binMeans()`

to specify whether binning should be done by (u,v] or [u,v). Added system tests validating the correctness of the two cases.

- Bumped up package dependencies.

- SPEEDUP: Now utilizing
`anyMissing()`

everywhere possible.

ROBUSTNESS: Now importing

`loadMethod`

from**methods**package such that**matrixStats**S4-based methods also work when**methods**is not loaded, e.g. when`Rscript`

is used, cf. Section ‘Default packages’ in ‘R Installation and Administration’.ROBUSTNESS: Updates package system tests such that the can run with only the

**base**package loaded.

CLEANUP: Now only importing two functions from the

**methods**package.Bumped up package dependencies.

- CLEANUP: Now the package startup message acknowledges argument
`quietly`

of`library()`

/`require()`

.

- The dimension of the return value was swapped in
`help("rowQuantiles")`

.

- SPEEDUP: Made
`(col|row)Mins()`

and`(col|row)Maxs()`

much faster.

`rowRanges(x)`

on an Nx0 matrix would give an error. Same for`colRanges(x)`

on an 0xN matrix. Added system tests for these and other special cases.

- Bumped up package dependencies.

- Forgot to declare S3 methods
`(col|row)WeightedMedians()`

.

- Minor speedup of
`(col|row)Tabulates()`

by replacing`rm()`

calls with NULL assignments.

- CRAN POLICY: Now all Rd
`\usage{}`

lines are at most 90 characters long.

- SPEEDUP:
`binCounts()`

and`binMeans()`

now uses Hoare’s Quicksort for presorting`x`

before counting/averaging. They also no longer test in every iteration (== for every data point) whether the last bin has been reached or not, but only after completing a bin.

- Minor corrections and updates to help pages.

- Native code of
`logSumExp()`

used an invalid check for missing value of an integer argument. Detected by Brian Ripley upon CRAN submission.

- Added
`logSumExp(lx)`

and`(col|row)LogSumExps(lx)`

for accurately computing of`log(sum(exp(lx)))`

for standalone vectors, and row and column vectors of matrices. Thanks to Nakayama (Japan) for the suggestion and contributing a draft in R.

- Added argument
`preserveShape`

to`colRanks()`

. For backward compatibility the default is`preserveShape = FALSE`

, but it may change in the future.

Since v0.6.4,

`(col|row)Ranks()`

gave the incorrect results for integer matrices with missing values.Since v0.6.4,

`(col|row)Medians()`

for integers would calculate ties as`floor(tieAvg)`

.

- Now
`(col|row)Ranks()`

support`"max"`

(default),`"min"`

and`"average"`

for argument`ties.method`

. Added system tests validation these cases. Thanks Peter Langfelder (UCLA) for contributing this.

- Added argument
`ties.method`

to`rowRanks()`

and`colRanks()`

, but still only support for`"max"`

(as before).

- ROBUSTNESS: Lots of cleanup of the internal/native code. Native code for integer and double cases have been harmonized and are now generated from a common code template. This was inspired by code contributions from Peter Langfelder (UCLA).

- Added
`anyMissing()`

for data type`raw`

, which always returns FALSE.

ROBUSTNESS: Added system test for

`anyMissing()`

.ROBUSTNESS: Now S3 methods are declared in the namespace.

- CRAN POLICY: Made
`example(weightedMedian)`

faster.

In some cases

`binCounts()`

and`binMeans()`

could try to go past the last bin resulting a core dump.`binCounts()`

and`binMeans()`

would return random/garbage values for bins that were beyond the last data point.

Added

`binMeans()`

for fast sample-mean calculation in bins. Thanks to Martin Morgan at the Fred Hutchinson Cancer Research Center, Seattle, for contributing the core code for this.Added

`binCounts()`

for fast element counting in bins.

- CRAN POLICY: Replaced the
`.Internal(psort(...))`

call with a call to a new internal partial sorting function, which utilizes the native`rPsort()`

part of the R internals.

- Updated package dependencies to match CRAN.

- GENERALIZATION: Now
`(col|row)Prods()`

handle missing values.

- Package now only imports the
**methods**package.

- In certain cases,
`(col|row)Prods()`

would return NA instead of 0 for some elements. Added a redundancy test for the case. Thanks Brenton Kenkel at University of Rochester for reporting on this.

Added

`weightedMad()`

from**aroma.core**v2.5.0.Added

`weightedMedian()`

from**aroma.light**v1.25.2.

This package no longer depends on the

**aroma.light**package for any of its functions.Now this package only imports

**R.methodsS3**, meaning it no longer loads**R.methodsS3**when it is loaded.

- Updated the default argument
`centers`

of`rowMads()`

/`colMads()`

to explicitly be`(col|row)Medians(x,...)`

. The default behavior has not changed.

ROBUSTNESS: Added system/redundancy tests for

`rowMads()`

/`colMads()`

.CRAN: Made the system tests “lighter” by default, but full tests can still be run, cf.

`tests/*.R`

scripts.

`colMads()`

would return the incorrect estimates. This bug was introduced in**matrixStats**v0.4.0 (2011-11-11).

`rowMedians(..., na.rm = TRUE)`

did not handle NaN (only NA). The reason for this was the the native code used`ISNA()`

to test for NA and NaN, but it should have been`ISNAN()`

, which is opposite to how`is.na()`

and`is.nan()`

at the R level work. Added system tests for this case.

- Added
`rowAvgsPerColSet()`

and`colAvgsPerRowSet()`

.

Added help pages with an example to

`rowIQRs()`

and`colIQRs()`

.Added example to

`rowQuantiles()`

.

`rowIQRs()`

and`colIQRs()`

would return the 25% and the 75% quantiles, not the difference between them. Thanks Pierre Neuvial at CNRS, Evry, France for the report.

- Dropped the previously introduced expansion of
`center`

in`rowMads()`

and`colMads()`

. It added unnecessary overhead if not needed.

- Added
`rowRanks()`

and`colRanks()`

. Thanks Hector Corrada Bravo (University of Maryland) and Harris Jaffee (John Hopkins).

- SPEEDUP/LESS MEMORY:
`colMedians(x)`

no longer uses`rowMedians(t(x))`

; instead there is now an optimized native-code implementation. Also,`colMads()`

utilizes the new`colMedians()`

directly. This improvement was kindly contributed by Harris Jaffee at Biostatistics of John Hopkins, USA.

- Added additional unit tests for
`colMedians()`

and`rowMedians()`

.

- Now the result of
`(col|row)Quantiles()`

contains column names.

- Added a startup message when package is loaded.

- CLEANUP: Removed obsolete internal
`.First.lib()`

and`.Last.lib()`

.

- Fixed some incorrect cross references.

`(col|row)WeightedMeans(..., na.rm = TRUE)`

would incorrectly treat missing values as zeros. Added corresponding redundancy tests (also for the median case). Thanks Pierre Neuvial for reporting this.

`colRanges(x)`

would return a matrix of wrong dimension if`x`

did not have any missing values. This would affect all functions relying on`colRanges()`

, e.g.`colMins()`

and`colMaxs()`

. Added a redundancy test for this case. Thanks Pierre Neuvial at UC Berkeley for reporting this.`(col|row)Ranges()`

return a matrix with dimension names.

- WORKAROUND: Cannot use
`"%#x"`

in`rowTabulates()`

when creating the column names of the result matrix. It gave an error OSX with R v2.9.0 devel (2009-01-13 r47593b) current the OSX server at R-forge.

- Updated the help example for
`rowWeightedMedians()`

to run conditionally on**aroma.light**, which is only a suggested package - not a required one. This in order to prevent`R CMD check`

to fail on CRAN, which prevents it for building binaries (as it currently happens on their OSX servers).

- For some errors in
`rowOrderStats()`

, the stack would not become UNPROTECTED before calling error.

- Added methods
`(col|row)Weighted(Mean|Median)s()`

for weighted averaging.

- Added help to more functions.

- Package passes
`R CMD check`

flawlessly.

- Added
`(col|row)Tabulates()`

for integer and raw matrices.

`rowCollapse()`

was broken and returned the wrong elements.

Added

`(col|row)Collapse()`

.Added

`varDiff()`

,`sdDiff()`

, and`madDiff()`

.Added

`indexByRow()`

.

Added

`(col|row)OrderStats()`

.Added

`(col|row)Ranges()`

and`(col|row)(Min|Max)s()`

.Added

`colMedians()`

.Now

`anyMissing()`

support most data types as structures.

Imported the

`rowNnn()`

methods from**Biobase**.Created.