Packag topicmodels now depends on package methods instead of importing it.
Package SnowballC is now suggested instead of Snowball.
A check was added to ensure that no empty documents are in the data. Thanks to Terry Therneau for pointing the problem out.
The first argument in the functions printf_vector and printf_matrix defined in the C code for the CTM was corrected to be const char *. Thanks to Murray Stokely for providing the patch.
A bug in function
posterior was fixed where the
rownames of the wrong object were used. Thanks to Benjamin S. Porter
for pointing the problem out.
Dependency structure changed such that some packages are now only imported.
The information printed during the VEM algorithm when
verbose is larger than 0 was improved.
The code in the vignette for removing HTML markup was modified due to changes in package XML.
A memory leak in the code of the fit function for LDA with method
"VEM" was corrected. Thanks to Ramis Yamilov for pointing
the problem out.
The included dataset AssociatedPress had row names which were of type integer and not of type character. The object was re-saved omitting the row names.
Vignettes moved from /inst/doc to /vignettes.
The source code for fitting the model using Gibbs sampling was modified because the code did not compile on Solaris. Thanks to Prof. Brian D. Ripley for pointing the problem out.
dtm2ldaformat() was modified to ensure that the resulting matrices
for the documents contain integers. In addition
ldaformat2dtm() were changed to also work for document-term
matrices containing empty documents and an argument was introduced
to indicate if empty documents should be removed. Thanks to Eu Jin
Lok for pointing the problems out.
Missing 'Suggests' entries added in the DESCRIPTION file. Thanks to Prof. Brian D. Ripley for pointing the problem out.
Name tags for Rd files changed to not contain slashes. Thanks to Prof. Brian D. Ripley for pointing the problem out as indicated in bug PR14707.
A small bug fixed when saving interim results for fitting a LDA model using Gibbs sampling. Thanks to Nicholas Switanek for pointing the problem out.
Makevars.win changed due to changes on CRAN for making libgsl for Windows. Thanks to Prof. Brian D. Ripley for pointing that out.
The package vignette has been published in the Journal of
Statistical Software, Volume 40, Issue 13
(http://www.jstatsoft.org/v40/i13), and the paper should be
used as citation for the package, run
citation("topicmodels") for details.
C code changed to allow the package to compile on Solaris systems. Thanks to Prof. Brian D. Ripley for pointing the problems out and recommending suitable changes.
C code changed to avoid warnings of unused variables.
The slots for documents and terms names are not restricted to be
"vector" any more to allow for document-term matrices
where no row and/or column names are provided.
perplexity() added for model validation and selection.
The input data for
CTM() can now either be a
"DocumentTermMatrix" with term-frequency weighting or an object
coercible to a
"simple_triplet_matrix" with integer entries.
A bug in the C++ Gibbs sampling code fixed for the random number generation. Thanks to Uwe Ligges for pointing the problem out which he noted when checking the package for the Windows platform.
New control arguments added for keeping intermediate log-likelihood values during estimation and running repeated runs with random initilization. In addition the number of iterations made is now saved with the fitted model.
dtm2ldaformat() added to transform data
from the lda package into a
"DocumentTermMatrix" object and vice
Bug fixed in rctm.c where for
estimate.beta = FALSE one EM step
The control for topic models now also has a
argument to ensure reproducibility of results and a
estimate.beta argument which can be used to fix the term
distribution over topics after initialization.
The control for Gibbs sampling allows to specify to return
repeated draws in a list using arguments
In slot beta for class
"TopicModel" the log parameters are stored
to have a higher accuracy for the VEM code if parameter values are
close to zero.
Call to assert removed in C code to avoid termination of R.
"TopicModel" now has a slot
loglikelihood. For models fitted using Gibbs sampling this
contains the loglikelihood of the corpus, for VEM fitted models
the vector of loglikelihoods for each document separately.
Memory bug fixed in
save is added to the control objects to specify if the
results and with which step size intermediate results are saved
Header files changed in utilities.cpp following an advice by Prof. Brian D. Ripley.
Code for installing the package corpus.JSS.papers in the vignette improved.
dir.create() now called with
showWarnings = FALSE.
Bug fixed in
get_most_likely() for maximum possible k.
First version released on CRAN: 0.0-3.