analogue Change Log
Version 0.13-6
* Plot3d: new name for the previous `plot3d.prcurve()` method.
Renamed to avoid having to export a method from the namespace,
which allows **rgl** to be relegated to Suggests.
* plot3d.prcurve: deprecated this S3 method and removed **rgl**
from the package dependencies. **rgl** is now in Suggests, which
means it is not needed to install the package. This seems to be
causing some problems for Mac OS users.
* evenSample: new function returns the number of samples per
gradient segment. Has a `plot()` method.
* Pollen, Biome, Location, Climate: data sets from the North
American Modern Pollen Database updated to version 1.7.3.
* Imports: lattice moved from Depends to Imports. analogue now
exports the generics densityplot, dotplot, and histogram imported
from the lattice namespace.
Version 0.13-5
* n2: new utility function to compute Hill's N2 for sites
or species.
* optima: added bootstrap estimates of species WA optima.
* roc: fix a bug in the computation of AUC from the U statistic.
Version 0.13-4 13 Feb 2014
* crossval.pcr: Fixed a number of bugs in the method for PCR
related to k-fold cross validation, which were causing errors.
Fix the verbose printing of the progress bar, which would reset
between repeats in k-fold CV.
* predict.pcr: would set argument `ncomp` incorrectly (in the
wrong form) if not supplied.
* performance.crossval: A new method for objects of class
`"crossval"`.
* ChiSquare: Wasn't returning the list it created including
transformation parameters.
Version 0.13-3 Opened 11 Feb 2014
* timetrack: A number of additions added and improvements made:
o New `predict()` method allows additional passive points
to be located in the timetrack space.
o New `points()` method to allow drawing of points for
training or passive samples on an existing plot.
o The `plot()` method can now suppress plotting of all
points, for a clean canvas with axes/labelling ready to
accept additional plotting function calls.
These changes were made following a query by Andrew Medeiros.
Version 0.13-2 Opened 1 Jan 2014
* prcurve: uses `dev.hold()` & `dev.flush()` to smooth graphics
flicker during fitting with `plotit = TRUE`,
Version 0.13-1 Opened 24 December 2013
* plot3d.prcuve: was not using the `data` & `ordination` objects
stored within the fitted `prcurve` object.
Version 0.13-0 Development branch opened 16 December 2013
* predict.pcr: internal function was calling `fitPCR()` via
`analogue:::fitPCR()`, which is not required nor was it intended.
Reported by Brian Ripley.
* predict.prcurve: new function to predict locations on the curve
for new observations on the same set of variables. Useful for
adding passive species.
* fitted.prcurve: new function to return the fitted locations on
the principal curve or the fitted values of the response.
Version 0.12-0
* Released to CRAN December 13th 2013
Version 0.11-99
* Preparing for the 0.12-0 release.
* NEWS: analogue now has an `inst/NEWS.Rd` file highlight the major
changes in the upcoming 0.12-0 release.
Version 0.11-6
* distance3: removed - redundant attempt to improve `distance()`.
* distance: `newDistance()` -> `distance()`
`distance()` -> `oldDistance()`
This implements the change suggested in 0.11-5. `distance()` is
now using the compiled C versions of the dissimilarity code.
* oldDistance: (was `distance()`) fixed a bug in the x-only case
where `method = "kendall"`.
* Vignette: Updated some details regarding C versions of
dissimilarity coefs.
* performance: a tweak to the print method to zap values that
are effectively 0. Only affects vectors of performance statistics
not data frames of stats.
* predict.pcr: Apply transformation function and perform
predictions for LOO, n k-fold, and bootstrap predictions
* crossval.pcr: leave-one-out CV was incorrectly averaging over
components. Now does bootstrap and n k-fold CV.
Version 0.11-5
* newDistance: (yet another) new distance() replacement to
interface with fast C versions of the dissimilarity code. This
one *will* replace `distance()` in version 0.12-0, where upon it
will be renamed to `distance()` and the current `distance()` will
be renamed to `oldDistance()`.
* Unit tests: started adding unit tests using the **testthat**
package. Hence **analogue** now `Suggests: testthat`.
Version 0.11-4
* logitreg: fitting is now possible using Firth's bias reduction
technique via the brglm package.
Changes to the labels in the output of the `summary` method.
* plot3d.prcurve: dynamic 3D plot of the data in PCA space with
the fitted principal curve superimposed.
* smoothGAM: new smooth function plugin for `prcurve`. Allows
fitting principal curves via individual GAM models using
`mgcv::gam` as the engine. The main advantage is that data sets
with non-Gaussian errors can now be handled more appropriately,
such as handling count data correctly.
`smoothGAM` is much slower than `smoothSpline` currently, although
there is potential for speeding this up via pre-computing some of
the GAM terms. As an example, the Abernethy Forest example in
`?prcurve` takes ~10 seconds on my 2-year old laptop.
* prcurve: improvements to the printed output during fitting (i.e.
with `trace = TRUE`) displays a progress bar during initial
estimates of smooth complexities. Residual variance printed to
fewer decimal places.
Now returns components `ordination` and `data`, the PCA ordination
(resulting from `rda()`) and the original data used to fit the
curve, respectively. This simplifies the `plot` and `lines`
methods for example.
* residuals.prcurve: new `residuals` method for principal curves
extracting or computing various types of residual for a fitted
curve.
* plot.prcurve, lines.prcurve: much improved following the
addition of new components in the object returned by `prcurve`.
No longer need to supply the original data used to fit the curve.
The scaling to use for the plot can now be specified via new
argument `scaling`. This makes the `lines` method more broadly
useful.
* chooseTaxa: would drop empty dimensions if conditions resulted
in just a single taxon being selected. Reported by Michael
Burstert.
The warning about `NA`s was also being issued even if
`na.rm = TRUE` was used. Now fixed.
* Streamlined some of the documentation to avoid runnning the same
code many times.
* plot.sppResponse: accepts a logical vector for argument `which`.
* wa: now warns if species with no information are removed from
the analysis, which proceeds as it always has.
Version 0.11-3
* chooseTaxa: new argument `value` controls whether the data for
the selected species or a logical vector indicating which columns
(species) met the selection criteria.
New argument `na.rm = FALSE` to control whether or not `NA`s are
excluded from the calculation of abundance and occurrence. Suggested
by Michael Burstert.
* scores.prcurve: now preserves the rownames on the `lambda`
component.
* sppResponse: new generic function for species responses along
gradients. A method is provided for objects of class "prcurve"
is the only available method currently. A `plot()` method for
`sppResponse()` objects is also provided.
* rankDC: new function to compute the rank correlation between
gradient distances (e.g. environmental variables) and distances
in species composition. Has both base and Lattice graphics plot
methods (the latter via `dotplot()`).
* Stratiplot: new arguments `labelAt` and `labelRot` allow the
placement and rotation variable labels to be controlled, when not
using a strip.
Version 0.11-2
* timetrack: plot method now allows plotting of "lc" or "wa"
site scores for the base ordination. The latter is the default
to maintain backwards compatability.
The `formula` argument was not well implemented; using it would
mean that `X`, the main species data, would not be transformed,
and you couldn't use direct variables as these would not be found.
Now `formula` takes a one-sided formula describing the constraints.
Variables will be looked up inside the object passed to `env`. As
such, `env` needs to be a data frame or an object accepted as the
`data` argument in `model.frame()`.
The `fitted` method has changed slightly. The `type` argument
has been renamed `which`.
* scores: new methods for objects of class "timetrack" and
class "prcurve".
* analog: gains a method for objects of class "distance".
* distance: gains a new attribute ('type') which contains an
indicator of whether the distance matrix is symmetric (computed
on a single matrix) or asymmetric (dissimilarities between samples
of two matrices).
* distanceX: experimental replacement for distance() which uses
fast C code for computing dissimilarities via a .Call interface
based on base::dist().
Currently only the single matrix code has an R interface and
`method = "mixed"` is not hooked up at the moment. It is intended
that the next version of analogue (0.12-x) will have this version
of the code replace the old mainly R-based distance().
* predict.wa: deshrinking step sometimes produced a 1 column
matrix, which would result in an error. This empty dimension is
now dropped.
Version 0.11-1
* scores.prcurve: new function to extract "axis" scores for
samples on the fitted principal curve.
* lines.prcurve: new lower-level plotting function allows
a fitted principal curve to be added to an existing PCA
plot.
* prcurve, smoothSpline: `prcurve` now returns a component
`smooths`, a list containing the fitted smoothers, one per
variable. As a result `smoothSpline` now also returns the
fitted `smooth.spline` model.
* gradientDist: the "prcurve" method was ordering the samples
such that they were smooth. No need for the `order` argument
now either.
* Namespace: simplified the import from lattice. Added more
imports from grid as this package no-longer in Dependencies.
* Dependencies: grid moved to Imports:
* distx.c: Uninitialised variable use reported on MacOS X under
the clang compiler. Reported by Brian Ripley.
* Maintainer: Updated the maintainer email address.
Version 0.11-0
* timetrack: fitted method gains argument `choices` with
default `1:2` for extracting the ordinary or passive
samples scores on `choices` axes. `plot` method can now draw
the passive samples as a line, points or both. The `plot`
method also gains an argument `order` that can be used to
reorder the passive samples into correct temporal ordering.
* Dependencies: MASS and mgcv moved from Depends: to Imports:
reflecting the relatively light use of functions from these
packages in analogue. The functions needed are imported in
analogue's NAMESPACE file.
* Tests: code in wa() generates slightly different results
under R 3.0.0 (to be). Differences are in the 8th or 9th
decimal place so irrelevant, but I was using the reference
output for the examples was checking to that level of precision.
For tests only, the Example in ?wa uses options(digits = 5).
Version 0.10-0
* Release: Version 0.9-11 plus a minor documentation fix
was released to CRAN 2 Jan 2013.
Version 0.9-11
* Was loading compiled code both via the package namespace
and a .onAttach() call. The latter was removed. Reported by
Kurt Hornik.
Version 0.9-10
* stdError: Several changes and enhancements:
Calculation assumed weights summed to 1. New formula as
described in Simpson (2012) is now used. (Reported by Steve
Juggins)
Now have a choice whether to use the weighted SD or not. For
predictions based on the mean of the k-closest analogues it
would be odd to then use a weighted SD to compute the standard
error.
Gained a print method.
* caterpillarPlot: can now be called by the shorter name
caterpillar().
* plot.logitreg: bug would cause plotting to fail if you plotted
a single model.
Version 0.9-9
* prcurve: added a print method.
* Stratiplot: y-axis padded by 4% of range as per base graphics
default behaviour.
* plot.wa: don't rerun example(wa) just to plot.
* caterpillarPlot: tolerance ranges for taxa drawn with lwd = 2.
Version 0.9-8
* panel.Stratiplot: `type = "h"` now drawn using `lwd = 3` and with
`lineend = "butt"`. The line width can be controled by new argument
`lwd.h`.
Version 0.9-7
* Stratiplot: In 0.9-5 a change was made to stop stripping NAs
when using the formula interface. This means a way of dealing with
NAs in the default plotting method is required especially when
plotting using lines or poylgons, as Lattice will honour the NA
and draw lines and polygons with gaps.
Stratiplot.default gains argument `na.action` which defaults to
`"na.omit"`. Note this is different to the formula method where
we *want* NAs to propogate through to the default method for
plotting.
This change allows for datasets collected on different sediment
intervals from the same core to be combined in a single diagram.
Version 0.9-6
* caterpillarPlot: now no longer draws a box round the plot.
Bug fixed so that the data.frame method correctly identies the
`env` argument to label the plot with.
Updated help file.
Version 0.9-5
* Stratiplot: the formula interface was stripping rows with NAs.
This wasn't intended but was the result of not implementing all
of the standard non-standard evaluation idiom used by many of R's
modelling functions.
This is now fixed and the default for the `na.action` argument
has been changed to `"na.pass"`.
The default `Stratiplot()` method was working as expected.
* Vignette: typo fixed (reported by Marta Rufio).
Version 0.9-4
* logitreg: the returned object has changed. The list of logistic
regression models is now returned as component `models`.
New methods `fitted()` and `predict()` provided for objects of
class "logitreg" compute the fitted probabilities for the
training set samples and for new (e.g. fossil) samples
respectively. The probabilities are in respect to the analogue-
ness of samples to the groups in the training set (e.g.
vegetation biomes in the case of pollen data).
These changes allow an analysis similar in spirit to that of
Gavin et al (2003, Quaternary Research 60; 356--367) in their
Figure 8. Here though logistic regression fits are used rather
than the ROC method they employ. Similar methods could be
provided for objects fitted by `roc()` but require a little more
thought about how to model the likelihood ratios derived from
the ROC curves.
Version 0.9-3
* wa: deshrinking via a monotonic cubic regression spline
is now available via `deshrink = "monotonic"`. This uses
functions from the *mgcv* package of Simon Wood and as a
result, *analogue* now Depends on that package too. The
exact nature of the dependency may change before 0.10 is
released.
This idea goes back to ter Braak & Juggins (1993; Hydrobiologia
*269/270*, 485--502) and Marchetto (1994; Journal of
Paleolimnology *12*, 155--162), but the implementation here
uses monotonic constraints after Wood (1994; SIAM Journal on
Scientific Computing *15*(5), 1126--1133 and follows Steve
Juggins' implementation borrowing code from `?pcls` in *mgcv*.
* predict.wa: example was enclosed in \dontrun{} without
reason. This example is now run.
Version 0.9-2
* wa: small tolerances can now be replaced by the mean
tolerance of the set of tolerances that are not small.
* splitSample: several bug fixes and sanity checks.
Version 0.9-1
* splitSample: new function to sample a test set from across
an environmental gradient by breaking gradient into a series
of chunks and sampling approximately equally from within each
chunk.
Version 0.9-0
* caterpillarPlot: new function that draws a caterpillar plot
of species WA optima and tolerances. Methods for data frames
and wa() fits are available alongside the default method.
Version 0.8-2
* Dependencies: analogue now requires R >= 2.15.0
* Replaced remaining instances of `.Internal`; now use
`.colSums` and `.rowSums` from R 2.15.0
* Deleted jss.bst from inst/doc
* Vignettes: these are now in vignettes not inst/docs
Version 0.8-1
* cma: if cutoff meant that all analogues returned for all
sites, code would return an array instead of the usual list.
This is now fixed in all methods.
* Replaced instances of `.Internal(sample(....))` with
`sample.int(....)` at request of Brian Ripley.
Version 0.8-0
* Updated Example test checks and packaged for release to CRAN
Jan 11, 2012.
Version 0.7-7
* mat: new argument `kmax` can be used to limit the number of
analogues considered as models when fitting MAT transfer
functions. By default, `mat()` considers models with 1 through
to n-1 analogues (n = number of sites). `kmax` can control this
upper limit which will speed up fitting models, especially for
large training sets. Invariably one wouldn't want to average
over entire training sets to produce predictions, or even over
large numbers of analogues. As such I may set an upper limit for
the default value of `kmax` before this is released to CRAN.
* cumWmean, cummean: as a result of the above addition of `kmax`,
these two functions now take a `kmax` argument also. The default
behaviour is unchanged however.
* chooseTaxa: `type = "OR"` was not working due to a typo. It
returned the same as `type = "AND"`.
Version 0.7-6
* Stratiplot: Handling of absolute data types was broken. Fix
applied that should allow this to work if there are only
absolute scale variables or a mix or relative and absolute
data. All reletaive data should be unaffected.
* panel.Stratiplot: gains arguments `gridh` and gridv` which
control the number of horizontal and vertical grid lines used
on each panel. These correspond to the `h` and `v` arguments of
`panel.grid` in the Lattice package. The default is `-1` for
both, which attempts to align the grid lines with the tick marks.
Version 0.7-5
* weightedCor: implements one of the tests from Telford & Birks
(2011, QSR) based on the weighted correlation of WA optima and
constrained ordination species scores. Has a plot method.
* rdaFit: Non-user (currently) function that implements RDA
without all of the overhead of vegan::rda. As such it doesn't
compute PCA axes and does not return all the components described
by ?cca.object in package vegan. This function is used principally
in weightedCor(). Has a scores() method. rdaFit() is not
documented as the exact details of the function and its
capabilities remain to be determined.
Version 0.7-4
* gradientDist: new function to extract locations along an
ordination axis. Methods for prcurve() and cca().
* varExpl: new function to extract the amount of variance
explained by ordination axes. Currently methods for prcurve() and
cca() are available.
* Namespace: analogue now has an explicit name space in
preparation for R 2.14.0-to-be. Hence analogue now depends on
Vegan >= 1.17-12.
Version 0.7-3
* pcr: coef(), fitted(), residuals(), eigenvals(), performance(),
and screeplot() methods added.
Version 0.7-2
* pcr: new function pcr() performs principal components
regression. Designed to allow transformations in the spirit of
Legendre & Gallagher (2001) that allow PCA to be usefully
applied to species data.
Version 0.7-1
* crossval: new function to perform leave-one-out, k-fold,
n k-fold, and bootstrap cross-validation on transfer function
models. A method for wa() models is provided.
* tests: package now has a test that the examples continue to
return correct output.
Version 0.7-0
* timetrack: new function to passively project sediment core
samples within an ordination of training or reference set
samples. Both unconstrained and constrained ordinations are
supported using the Vegan package. 'fitted' and 'plot' methods
are available.
* prcurve: new function to fit principal curves to sediment
core samples. A 'plot' method is also provided. The function uses
functionality from the princurve package, which is now a
dependency.
Several support functions are also provided; 'smoothSpline' is
a wrapper to 'smooth.spline' for fitting splines to individual
species in order to fit the principal curve. 'initCurve'
implements several methods for initialising the principal curve.
* Stratiplot: if 'zones' are supplied, a legend on the right-hand
side of the diagram can be drawn by setting argument 'drawLegend'
to TRUE (the default). Currently, only simple blocks that
demarcate the zone boundaries are drawn and labelled using
argument 'zoneNames'.
First attempt to allow both relative (percentages or proportions)
and absolute variables, or mixtures thereof, in a single plot. The
user is free to specify which variables should be treated as relative
or absolute, and variables marked as absolute will be drawn with
fixed-width panels, the size of which can be controlled via argument
'absoluteSize' (default is 0.5 * largest panel width). Consider
this functionality unstable at the moment.
* residLen: was not 'join'-ing the training set and passive data
correctly and would fail if species were found in one but not the
other data set.
* tran: improvements to the underlying code.
* distance: resilience to NA in "gower", "alt.gower", "mixed".
* cma: added methods for 'mat' and 'predict.mat' objects. These
allow you to retrieve the k-closest analogues for training set
and prediction data respectively.
* dissimilarities: new method for 'mat' objects.
* datasets: package datasets have been resaved with optimal
compression determined via resaveRdaFiles(). This has reduced
the package tarball size considerably. As a result, however,
analogue now requires R version 2.10.0 or later.
* predict.wa: bug in bootstrap and k-fold CV methods when
tolerance down-weighting was used.
* fixUpTol: erroneous error criterion would cause CV of WA models
with tolerance down-weighting to stop with an error.
* waFit: new function that encapsulates the main WA computations.
This is currently used by wa() and with the intention of being
used in all functions that computed WA transfer function models.
* Examples: Streamlined some further examples to use Imbrie &
Kipp data set, and to not re-run the same code again. Improves
package check times by a second or two on my PC.
Version 0.6-26
* abernethy: New data set containing the classic Abernethy Forest
data of Birks and Mathewes (1978)
* Stratiplot: Preserves the names component as far as is
possible, even to the extent of processing the names after the
manipulations arising from the formula interface.
Bug in padding of the y-axis now fixed; default is to add 1% of
the range y-axis to the y-axis limits specified.
Bug in computing length of variable labels when 'strip = FALSE'
now fixed.
* panel.Stratiplot: Add capability to draw zones on stratigraphic
plots via new argument 'zones' which takes the numeric levels of
the zone boundaries on the scale of the plot y-axis. How the
zone markers are drawn can be controlled via several graphical
parameters. See ?panel.Stratiplot.
* chooseTaxa: Explicitly preserves row and column names.
* DESCRIPTION: prematurely added princurve as a dependency in
previous version.
Version 0.6-25
* chooseTaxa: new function to select species on basis of number
of occurrences and maximum abundance. Function is an S3 generic
with a default method.
Version 0.6-24
* Dependencies: package now depends on package 'grid'.
* Stratiplot: gains ability to draw variable labels above the
plot panels so that the plots conform to common standards. If
you prefer the 'strips' of Lattice plots, set 'strip = TRUE'
to get the old behaviour.
Stratiplot was fixinging the min(ylim) value at 0 and contained
redundant calls to set the y-axis limits. The behaviour has been
rationalised and a new 'ylim' argument added. The default
behaviour uses the range of the y-data for 'ylim'.
* panel.Stratiplot: fix warning messages (from Grid) due to
inappropriate colour specification. Reference lines in
Stratiplot now plot correctly again.
* plot.roc: was resetting the plotting region at the end of
plotting even when there was no need to do so.
* residuals: Residuals were defined as \hat{x}_i - x_i to match
fitted vs. observed scatterplots. Definition of residuals in wa()
and related functions has been changed to the more common
definition of x_i - \hat{x}. Reported by Andreas Plank and Steve
Juggins.
* plot.wa: Following changed definition of residuals, plot.wa()
now plots observed values on the y-axis and fitted values on the
x-axis for 'which = 1'.
* summary.predict.mat: print method was incorrectly extracting
the model estimates for training set samples.
* predict.wa: fix minor bug with CV when tolerance DW was used.
* Package: reduced package check time in examples, by using
the Imbrie & Kipp data.
Version 0.6-23
* tran: 'rootroot' transformation was same as 'cuberoot' from
changes made in r140. Now fixed.
* wa: 'formula' method was not passing tolerance-related arguments
to default method. As such, the newer code to handle small
tolerance values was not being invoked when using the formula
interface to wa().
Code was also tidied a bit.
* fixUpTol: was inappropriately matching one of it's arguments.
* roc: In some circumstances, the generation of the points at which
the ROC curve was evaluated resulted in more points than the other
statistics. Fixed to use the points established by 'table', used to
generate these other statistics. This affected the plot method.
* plot.roc: Allow user to specify the line types used to draw the
plots. Two line types can be specified for plots comparing analogue
with non-analogue statistics.
New argument 'abline.lty', which defaults to "dashed", controls
plotting of the optimal ROC dissimilarity threshold.
* plot.minDC: argument 'lty.quantile' was not being used by the
graphical function that drew the quantiles of the pairwise D[ij]s.
* plot.bayesF: New argument 'abline.lty', which defaults to "dashed",
controls plotting of the optimal ROC dissimilarity threshold.
Version 0.6-22
* tran.formula: bug in my implementation of the standard
non-standard evaluation technique used within the formula method
for 'tran'. Diagnosis and fix by Prof. Brian Ripley.
Version 0.6-21
* distx.c, distxy.c: Warning due to non ISO C-compliant 'mistake'
in experimental code for Gower's Mixed coefficient.
Version 0.6-20
* Stratiplot: was not respecting the sort variable under certain
conditions.
* panel.Stratiplot: typo in Rd file was causing warnings in R
version 2.10.0-beta.
Version 0.6-19
* join: gains ability to return the inner join of the supplied
data frames. This is the intersection of the set of variables
in the supplied data frames, the set of variables common to the
supplied data frames.
Version 0.6-18
* join: new arguments 'type' and 'value'.
'type' controls which join is performed. Options are (currently)
"outer" (default) and "left". The left join is used to prepare two
or more data sets for ordinating the first and subsequently
passively projecting the other data sets into this ordination.
The outer join is used to prepare data for transfer functions such
as MAT and WA.
'value' allows the user to supply a numeric value to be used to
replace 'NA's.
Version 0.6-17
* predict.wa: deshrinking method was not being honoured when
expanding predictions.
* Stratiplot: gains option ('rev') to reverse the y-axis limits.
Can now also sort/order the columns of 'x' (the species or
variables) as weighted averages of 'y' (to emphasise the change
in composition along 'y'), or using a supplied variable. The
latter is useful if you want to sort the variables by their optima
with an environmental variable.
Now also provides a guess as to the y-axis label if none is supplied.
Version 0.6-16
* roc: For large problems the calculation of AUC and its standard
error could overflow the largest number R currently handles. roc()
now has two new arguments, 'thin' and 'max.len', which allow the
number of points on the ROC curve to be thinned to a smaller number,
which should allow the computations to be performed. The original
problem was reported by Diana Stralberg.
Version 0.6-15
* tran: new 'formula' method allows simple selection or exclusion
of variables from the set to be 'tran'sformed.
Version 0.6-14
* tran: new transformation and standardization methods for the power
and 4th root transformation, the log ratio transformation for
compositional data, plus row (sample) centring.
Version 0.6-13
* predict.wa: Now handles WA with tolerance down-weighting for
bootstrap CV and benefits from the changes introduced in previous
version.
Version 0.6-12
* deshrink, deshrinkPred: New utility functions for deshrinking WA
estimates. These replace the '*.deshrink' and 'deshrink.pred' internal
functions used to this end to date. This provides a more extensible
solution.
Version 0.6-11
* tran: now converts input data to matrix using 'data.matrix' which
deals with factor variables appropriately.
* predict.wa, wa, WATpred: Now use faster C code for computing
predictions from WA models with tolerance down-weighting.
Version 0.6-10
* predict.wa: Predictions with tolerance DW now works for CV = "LOO"
* wa.formula: Argument list updated to match wa.default.
* fixUpTol: New internal utility function the encapsulates code to
modify working tolerances within WA model fitting.
* w.tol: Internal function w.tol now uses a faster C version of the
code.
Version 0.6-9 (Closed Sun 7 June 2009)
* predict.wa: Predictions without CV can now be made for WA models
fitted using tolerance DW.
* wa: Now returns the options for tweaking tolerances as part of
model object in component 'options.tol', which is a names list.
* Utility functions: WApred() and WATpred() internal functions for
predictions using WA or WA with tolerance DW.
Version 0.6-8 (Closed Mon 5 May 2009)
* residLen: new function to compute squared residual length
diagnostic for passive samples in a constrained ordination. Used as
a test of whether core samples are well fitted in a transfer function
model.
Several utility functions to compute fitted values from an ordination
and corresponding residual lengths are provided, which me be useful
for authors of other functions.
'plot' and 'hist' methods produce density plots and histograms for
'residLen' objects using base graphics.
'densityplot' and 'histogram' methods for 'residLen' objects using
Lattice graphics.
* stdError: new function to compute the weighted standard deviation
of the environment values for the k closest analogues in MAT models.
This can be used as an uncertainty measure for MAT fitted values or
transfer function predictions.
Methods are available for 'mat' and 'predict.mat'.
* predict.mat: now returns the dissimilarity matrix between the
training set samples and samples in 'newdata'.
* getK: new method for 'predict.mat'.
* CITATION: updated as per request from Kurt Hornik and CRAN.
Version 0.6-7 (Closed Mon 13 Apr 2009)
* optima, tolerance: new methods to coerce objects of these classes
to data frames.
* distance: method = "kendall" was incorrectly computing the min of
the x and y components in the dissimilarity.
Version 0.6-6 (Released to CRAN: Wed 25th Feb 2009)
* optima, tolerance: New print methods for both functions. Returned
objects now have additional attributes.
* join: new methods for 'head' and 'tail' to return the first/last
few rows from each of the joined data sets. Handles cases where
'split = FALSE' by calling the 'data.frame' method.
Version 0.6-5
* wa: now computes tolerances and can perform tolerance downweighting
in WA transfer functions. Contains several options to manage working
tolerances used in the WA computations, including how to deal with
species that have very small (narrow) tolerances.
The actual tolerances and working values are returned from wa().
* optima, tolerance: two new user visible functions to compute weighted
average optima and tolerance ranges from species abundances and
associated environmental data.
Version 0.6-4
* Datasets: Version 1.7 of the North American Modern Pollen
Database has been added to 'analogue'. The data are contained
in four datasets: Pollen, Biome, Climate and Location, containing
the pollen counts on 134 taxa, vegetation classification, 32
climatic variables and location (latitude/longitude) respectively
on 4833 sampling locations in North America and Greenland.
* plot.logitreg: adjusted the correction to the degrees of freedom
in the calculation of the confidence intervals.
* roc: now returns the observed prior probability that two samples
are analogues for each group. Also returns the index of the point
along the ROC curve where the slope of the curve is maximal (the
point corresponding to the optimal dissimilarity).
* bayesF: now returns the posterior probabilities as well as
posterior odds of true analogue and true non-analogues for points
along the ROC curve.
Documentation of the object returned from 'bayesF' has been updated
to match the changes introduce in version 0.6-0.
* wa: documentation for wa did not state that the 'tol.dw' argument
was currently ignored. Tolerance down weighting is not currently
implemented in wa and the documentation now states this clearly.
Reported by Andreas Plank (R-Forge Bug ID 287).
Version 0.6-3
* logitreg: new function to evaluate the probability that two
samples are analogues conditional upon the dissimilarity between
the two samples. Essentially fits logistic regression models to
the data used to produce the statistics drawn on a ROC curve.
Methods for 'summary' and 'plot' are currently available.
* analog: was converting 'x' and 'y' objects to matrices before
calling distance(). This broke handling of factor variables in
'method = "mixed"' with distance().
* distance: objects created by distance() now have an explicit
class "distance", and inherit from class "matrix".
* roc: component 'statistics' has reordered columns.
* plot.roc: superficial changes to ordering of plot components.
* Depends: Package now depends on MASS. No longer need dependency
on brglm.
Version 0.6-2
* Stratiplot: new graphics function for plotting stratigraphic
diagrams, with 'default' and 'formula' methods. Uses the Lattice
package for plotting.
* panel.Stratiplot: lattice panel function for drawing stratigraphic
diagrams.
* panel.Loess: modified version of standard lattice panel function
'panel.loess' for drawing LOESS smooths on stratigraphic diagrams.
* Documentation: fixes and tweaks to several Rd files to fix parse
errors caught with the new Rd parser coming in R 2.9.0.
Version 0.6-1
* ImbrieKipp: made the training set environment and sediment core
data set names easier to manage. The three environmental variables
are now in seperate data sets ('SumSST', 'WinSST', and 'Salinity')
as named, numeric vectors of the same name as the data sets.
* mat: example now uses the ImbrieKipp data resulting in large
speed-up.
* Requires: package now depends on package 'brglm' for use in
modelling probability of analogue or not. For future 'logitReg()'
function.
Version 0.6-0
* roc: new version of roc, which correctly computes the no-analogue
part of the ROC analysis. Now roc returns information on individual
grops as well as an overall or combined ROC curve for the data.
The number of close analogues to use in computing the ROC curve
can now also be specified.
These changes have altered bayesF() and the plot methods for bayesF
roc. bayesF now computes Bayes factors for all groups as well as
for the overall ROC analysis.
The plot method for bayesF will now plot the Bayes factors for all
groups or for a single, named group. plot.roc has been updated to
work with the new roc object, and by default, the plots refer to the
overall ROC curve. Which group is plotted is controlled by new
argument 'group'.
There is now a summary method for roc that displays summary data for
the individual ROC curves.
* fuse: new function to fuse (combine) two or more
dissimilarity objects.
* ImbrieKipp: New data sets containing the classic Imbrie and Kipp
(1971) training set.
* tran: tran was clobbering dimnames. These are now preserved.
* .first.lib: package startup now uses packageStartupMessage()
to display the startup message.
* distance: speed up in calculating range and maximum statistics
for those dissimilarity coefficients that incorporate these terms.
distance() now also returns the dissimilarity coefficient used as
attribute "method"
* mat: was converting 'x' to matrix too early, which upset some of
the DC methods.
mat also now passes arguments in '...' on to distance. This allows
additional options required for some dissimilarity coefficients to
be provided.
* print.mat, summary.mat: quantiles of dissimilarities are now much
more efficiently calculated.
* plot.mcarlo: now works correctly for both types of plot, and
computes ranges so that histogram and density estimates fit into
plotting region.
Version 0.5-3
* tran: new function to apply common transformations and
standardizations applicable to palaeoecolgical data.
* predict.wa: added k-fold ("nfold") cross-validation.
Version 0.5-2
* wa: classical deshrinking did not work, but returned the
original 'env' variable. Currently a bit inelegant
implementation.
* wa: implemented deshrink = "none" just for comparison and for
connoisseurs.
* wa: deshrink = "expanded" is now public and user-callable.
* join: now checks for inherits(foo, "data.frame") to confirm
if all objects to join are (or inherit from) data frames. This
allows join to work on objects of class "join" when split = FALSE
is used.
Version 0.5-1
* New developer: Jari Oksanen has joined the analogue team!
* predict.wa: was not returning some attributes of the WA model
fitted. This was causing some print and other methods to fail.
* expand.deshrink: implemented simple expansion of variances as a
deshrinking method a bit like in vegan:::wascores. The function
has similar API as other deshrinking functions: takes only WA and
obs values as input, and returns expanded scores and two linear
coefficients to perform the deshrinking. Slope is given by the
expansion ratio and intercept is defined so that the line goes
through mean(x), mean(y) point. The vegan function equalizes
weighted variances, but this function only uses simple variances:
incorporating weights would mean changing call API. At the moment
the function is not yet used anywhere, but just sits there waiting
for possible use.
* wa, mat models: residuals are now calculated as predicted -
observed. This reverses the sign from the previous version. There
was inconsistency in the way residuals were being calculated in
MAT models and help functions. Now resolved.
* plot.mat: now plots the absolute value of the average or maximum
bias statistics, rather than the actual value. This ensures that
the "optimal" model is the one with the lowest value on the plot.
* Internal: The way deshrinking was handled internally has been
substantially streamlined, via the *.deshrink and deshrink.pred
internal functions.
Version 0.5-0
* wa: new function wa() with default and formula interfaces
for fitting Weighted Averaging transfer function models. plot,
fitted, residuals, coef, minDC, performance (see below),
predict and bootstrap methods are provided.
* performance: new extractor function to retrieve model
performance statistics. Currently, methods provided for
wa, predict.wa, and bootstrap.wa objects.
* reconPlot: new method for predict.wa objects.
* RMSEP: new method for bootstrap.wa objects.
* Vignette: analogue now has a vignette covering the analogue
methods implemented in the package. This is based on the paper
Simpson G.L. (2007) Analogue Methods in Palaeoecology: Using
the analogue Package. Journal of Statistical Software, 22(2),
1--29.
* plot.minDC: Bug in drawing the axis for the quantiles.
Version 0.4-4
* Updated the Version: field in DESCRIPTION to meet new
standards introduced in R 2.6.0 for licence files.
Reported by Kurt Hornik.
* join() now returns a object of class "join" or
c("join", "data.frame") depending on argument split.
* distance() is now generic and has a new method for
objects that inherit from class "join"..
Version 0.4-3
* distance() would work even if factors in x and y had
different levels. This would result in incorrect
dissimilarities for method = "mixed". distance() now
issues an error if one or more factors have different
levels in x and y. Use join() to get correct factors
and levels. Reported by Birgit Lemcke.
* join() was not correctly merging data frames with factors.
Factors were converted to internal values, not levels via
sapply(). Now uses data.frame(lapply(...)) to maintain
factors intact.
* distance() was not setting the row / column names in the
case where both x and y were supplied.
* distance() was incorrectly trying to set row / column
names in the case where a single dissimilarity was being
calculated.
* Documentation fixes.
Version 0.4-2
* New fitted method for bootstrap.map. Returns the bootstrap
fitted values for the training set.
* getK<- changed to setK<- as this makes much more sense.
The extractor function getK remains the same.
* Fixed a couple of bugs in residuals.bootstrap.mat and
print.residuals.bootstrap.mat that affected how the
results were printed. Now does what it was supposed to do.
* Fixed minor bug in the code that updated the call in
analog.
* Added automagical printing of version number on loading
of the package.
* Numerous documentation tweaks and updates have been
applied, which simplify package checking and which
provide better documentation of certain comples returned
objects.
Version 0.4-1
* Fixed silly bug in RMSEP.bootstrap.mat.
Version 0.4-0
* Changed the components of returned objects from mat,
bootstrap.mat, predict.mat. This has has knock-on
effects for several other functions. These have been
updated to work with the new objects/components.
* Speeded up bootstrap and predict.mat considerably.
* Speeded up distance for some coefficients and where 'y'
is missing, by using dist() and vegdist() from package
'vegan'. Dependency now on 'vegan'.
* k() and k()<- renamed to getK() and getK()<-.
* getK.bootstrap.mat is now able to extract the k
for the model or the predictions. In either case, the
bootstrap or the model k can be selected. See ?getK.
* New argument 'split' in join(), defaults to TRUE. join
can now unsplit the merged data sets back into
individual data frames, though now with common columns
(i.e. species).
* Bug in cummean() and cumWmean() meant a site could be
selected as analogue for itself now fixed.
* Bug in mcarlo.mat and mcarlo.analog meant it was not
reading the stored dissimilarity method correctly.
* maxBias() speeded up through use of tapply() instead of
aggregate(). Results in speed ups for mat() and
bootstrap().
* Screeplot() renamed screeplot(). Now works off the
screeplot generic function in R >= 2.5.0.
* screeplot method for bootstrapped models now draws
lines in different colours.
* As a result of the adoption of screeplot(), analogue
now depends on R >= 2.5.0.
* cma.analog and it's print and summary methods changed
so that they return an object even if all samples have
no close modern analogues.
* New RMSEP method for mat objects. Returns the LOO CV
RMSEP for a MAT model.
* Fixed minor bug in analog.default and how it recorded
the call.
Version 0.3-4
* New roc method for "analog" objects.
* New mcarlo method for "analog" objects.
* mat() now has a formula method and interface.
* cma() is now more efficient, but does not return the
same object components as before. $distances and $samples
have been replaced by $close, a list of the close modern
analogues for each fossil sample, with each component a
named vector of close modern analogues and their
distances.
* A much changed reconPlot(), with a now-working default
method that is used by other reconPlot methods.
reconPlot.predict.mat updated to reflect changes.
* Reverted the class of bootstrap() to "bootstrap.mat".
* Removed Encoding: UTF-8 from package DESCRIPTION file.
* Cut down some of the examples as they now take a while
to run with the larger data sets and because a vignette
is in the works they no longer need to be so
comprehensive.
Version 0.3-3
* Updated the example data sets to more complete
versions. See ?rlgh, ?swapdiat and ?swappH for
more details.
* Changes to predict.mat to return minimum DC's and
quantiles of training set DC's.
* Minor tweaks to plot.mat - now display a bit more info
such as 'k' for chosen model and whether it is weighted
or not.
* New function minDC() with print and plot methods, for
extracting and plotting minimum dissimilarity for
fossil samples. A default method and methods for classes
"predict.mat" and "analog" are provided.
* New function RMSEP for extracting or calculating RMSEP
for transfer functions.
* Modified output from print.analog, print.cma to be more
compact (former) and more descriptive (latter).
* cma() now returns the number of analogues per sample as
close or close than argument "cutoff". cma() also now
automatically determines "cutoff" if none supplied.
* plot.cma() was plotting quantile lines for all x$quants
whether they were greater than x$cutoff or not. Fixed
to plot only x$quants <= x$cutoff. A check is made to
determine if any(x$quants <= x$cutoff), and plotting
of the qunantile lines is supressed if FALSE.
* If 'y' was missing from distance() it was checking for
and deleting any species (columns) that were all zero.
* plot.mat() was not using the stored value of k in its
plots. Now that k() can change the stored value plot.mat
should use this rather than calculate its own k.
* plot.roc() was not drawing 'which = 3 ' correctly.
* Fixed up the citation file.
Version 0.3-2
* New method for Screeplot for objects of class
"bootstrap". Plots apparent and bootstrap statistics
in screeplot format.
* Begun to generalise bootstrap. bootstrap.mat now returns
an object of class "bootstrap". print, summary, residuals
and print.summary methods for "bootstrap.mat" have been
change to methods for "bootstrap". This is all in
preparation for adding other transfer function models to
analogue in later versions, for which bootstrapping is
also used.
WARNING: the object returned from bootstrap.mat has
changed subtly and will change periodically as new
transfer functions models are added to allow for
differences between models. The ultimate aim is to have
a reasonable generic object "bootstrap" regardless of
the transfer function model used.
Version 0.3-1 - The New Year edition
* Added 'stats' and 'graphics' to Depends: in the
DESCRIPTION. Requested by the CRAN Maintainers.
* New generic functions 'k' and 'k<-' for extracting and
replacing the number of analogues stored in models.
Currently for 'mat' objects only.
* New dissimilarity coefficient in distance(), for Gower's
general coefficient of similarity (expressed as a
distance/dissimilarity) for mixed mode data, including
factors. Use method = "mixed".
* Realised that there were a number of different variants
on Gower's coefficient out there. To be consistent with
package 'vegan', method = "gower" now computes the same
coefficient as vegan. The alternative formulation used
in Version 0.3-0 and earlier is now available as
method = "alt.gower".
* distance() now works with missing values for methods
"gower", "alt.gower" and "mixed" only.
* Renamed ToDo file to TODO, and updated the information
enclosed.
* Add acknowledgments file THANKS.
* Numerous documentation fixes.
Version 0.3-0
* First version released to CRAN.
* Minor documentation fixes prior to release.
* Fixed CITATION file, which had old package name. A
hang over from version 0.1-5.
Version 0.2-7
* Added new function bayesF() to calculate Bayes factors,
or likelihood ratios from the results of roc().
Includes simple print and plot methods, the latter
being used in plot.roc to provide a 5th plot of
roc results.
* Added a new plot to plot.roc() - showing the probability
of analogue (A+). This is now the default 4th plot drawn
by default, replacing the likelihood ratio plots, which
are harder to interpret.
* Documentation tweaks to many functions.
* Removed attributes from returned objects of functions
analog(), cma(), mat(). Former attributes are returned
as part of the restured object now. Updated all
functions that made use of these attributes.
* The analog method of cma() has new argument "prob"; a
vector of probabilities with values in [0,1], for which
quantiles of the distribution of training set
dissimilarities will be calculated.
* plot.cma() has new arguments; "draw.quant", "col.quant"
and "lty.quant". These detrmine whether quantile lines
are drawn on the stripchart, and the colour and line
type used if they are drawn.
* Restored dimnames to some elements of the returned object
from bootstrap().
* Streamlined print.summary.cma(), which now uses
print.cma() instead of duplicating code.
* Fixed print.summary.predict.mat to return the training
set assessment.
* Fixed print.predict.mat - wasn;t displaying the
bootstrap k.
* Altered summary.analog and its print method. Summary no
longer uses attributes to store information that is
subsequently printed.
* Added a package overview help page - access using:
package?analogue
Version 0.2-6
* Added new dissimilarity method "gower", for Gower's
coefficient. Note this version does not implement the
mixed version of Gower's coefficient. A future version
of distance() will include method "gowerMixed" for the
mixed data version (i.e. for mixed +/-, factor and
quantitative data).
Version 0.2-5
* Completely rewrote the mat method for roc(). Based on
Programmer's Niche article by T. Lumley in R News
(Vol. 4(1) 33--36). Uses the optimisations in the
article to calculate the ROC curve itself. Now much
faster, and produces a more compact return object than
before.
* Added a 4th plot to plot.roc(), which draws two
definitions of the slope of the ROC curve as
likelihood ratios.
* Added documentation for plot method of roc(),
including descriptions of what each plot shows.
* New function reconPlot with default and predict.mat
methods. Draws stratigraphic plots of reconstructions,
with or without error bars.
* mcarlo() and it's 'default' and 'mat' methods have
been largely re-written to make them more efficient.
mcarlo.mat() now access data from the 'mat' object
and calls mcarlo.default(), so only one set of
calculations now needs to be maintained.
* New arguments "diag" and "is.dcmat" for mcarlo().
* Added new dissimilarity methods "manhattan", and
"kendall" to calculate the Manhattan metric and
Kendall's coefficient, respectively, in distance().
* 'method = "information"' was not working correctly
if p_{ij} or p_{ik} were zero.
* Minor fix to distance(), allows 'method = "chi.distance"'
to work now. Minor tweaks to documentation to add
equation for chi^2 distance metric. Still some
equations need adding in correct notation.
* Minor updates to documentation and code for analog(),
mat() and mcarlo() to reflect additional dissimilarity
coefficients now available in distance().
* Fixed some formatting issues in bootstrap.Rd and updated
the documentation of the returned object to match code
changes in previous versions.
* predict.mat was defaulting to doing bootstrap
predictions, which can be time consuming. Default is
now to return normal predictions. Updates to the example
for predict.mat to reflect this change.
* Updated the documentation for predict.mat of the
returned object to match code changes in previous
versions.
* General update of all documentation pages.
Version 0.2-4
* Reverted the changes to fitted.mat and residuals.mat
as these functions no longer worked like similar
methods for other classes in R.
* Altered plot.mat to use fitted and residuals methods
for mat. Simplified extractions to generate one of the
plots considerably. Also reverted changes imposed by
fiddling with predict/fitted earlier.
* Minor tweak to distance() to allow it to calculate
dissimilarity between two individual samples only. For
use in mcarlo() for simulation/permutation of
dissimilarities.
* New function mcarlo(), with default and "mat" methods.
Experimental functions for simulating dissimilarities
in order to determine critical values for various
coefficients for use in identifying analogues.
* New function roc(), with default and "mat" methods. Fits
Receiver Operator Characteristic (ROC) curves following
the framework of Wahl (2005) to identify the critical
values of dissimilarity values. Also has a plot method
for drawing the actual ROC curves.
Version 0.2-3
* some issues with predict.mat() and print method
associated with fixes for 0.2-2 ironed out. Others
remain to be fixed - especially when not
bootstrapping; need a consistent object representation.
* fitted.mat now returns fitted values for all possible
k-closest analogues. The kth model that minimises the
RMSE (Apparent) is returned is user-supplied k not
given.
* residuals.mat now returns residuals for all possible
k-closest analogues. The kth model that minimises the
RMSE (Apparent) is returned is user-supplied k not
given.
* predict.mat and its print and summary methods now work
again properly after changes made in 0.2-2.
* summary.mat updated to work with new extractor
functions.
* plot.mat updated to work with new extractor functions.
Version 0.2-2
* bootstrap.mat(), predict.mat() and print and summary
methods now fixed to return stats for all k-closest
models. Needs docs for bootstrap.mat() updating;
currently the reconstructions are commented out.
* join() was dropping the rownames of the joined
objects. FIXED
Version 0.2-1
* New function plot.cma() to plot results of a call to
cma(). Uses stripchart() currently. Needs to be made
more robust and adaptable to larger sample sizes.
Version 0.2-0
* Minor documentation tweaks. Release 0.2-0 ready.
Version 0.1-9
* Added new function residuals.bootstrap.mat() and print
method.
* predict.mat() now doesn't set k to be the model with
lowest RMSE. If missing(k) in predict.mat(), k is set
to NULL and bootstrap.mat will choose k giving lowest
RMSEP assessed by bootstrap. If not using bootstrap
resampling in predict.mat(), k is still set to the
the model with lowest RMSE if not supplied.
Version 0.1-8
* Fixed a little bug in predictions for new samples in
bootstrap.mat() - was dropping the closest analogue.
Uses the newly fixed cumWmean() and cummean() functions
and argument "drop = FALSE".
* Fixed up bootstrap.mat() to have a cleaner return object
that is easier to maintain and IMHO use.
* bootstrap.mat() now uses new code to evaluate predictions
for new samples for all k, to match the previous changes
to bootstrap.mat(). Removed extraneous code from previous
versions.
* summary.bootstrap.mat() and summary.predict.mat() updated
to refer to the new returned object from bootstrap.mat().
* Updated documentation for bootstrap() and predict.mat()
and fixed up examples.
* Removed old file analogy-internal.Rd - hang over from
older package.
Version 0.1-7
* bootstrap.mat now uses the new code to return all
values. The swap example is taking c. 18 secs to run
on my laptop (1.8 Ghz P3m), with 1000 bootstraps. Not
too bad. Final code tidy required then release as
Version 0.2-0.
Version 0.1-6
* Prepared ground work for bootstrap.mat to bootstrap
for all k, not just user supplied k. Allows you to
choose size of MAT model based on bootstrap RMSEP and
other stats. Code works in bootstrap.mat() with
argument 'boot.train = TRUE', just needs resulting
returned object simplifying and removal of old code
that duplicates one set of calcs, and methods written
to display/plot the results of bootstrap on the
training set.
* cumWmean() and cummean() adapted for use in
bootstrap.mat() for choosing k. New argument
'drop = TRUE'; controls whether spurious zero distance
is ignored or not in calcuating cumulative stats.
Needed for bootstrapping training set for all k.
Version 0.1-5
* Changed package name to analogue
Version 0.1-4
* Added new distance/dissimilarity coefficient to
calculate Chi squared distance, sensu Lebart & Fenelon
(1971) [Statistique et informatique appliquees. Dunod,
Paris, 426 pp], the distance preserved in
correspondence analysis. To use this, use:
method = "chi.distance".
Version 0.1-3
* Data set rlgh was incorrectly saved.
Version 0.1-2
* Fixed a serious bug in join(), where rows were getting
dropped if they had exactly the same counts in them.
Solution provided by Sundar Dorai-Raj - see source for
join() for further details.
* join() now accepts any number of data frames as input,
not just two as originally. This is as a result of the
fix to join() above.
* Updated all examples using join() to match new
arguments of join().
Version 0.1-1
* First Development Release