Performance improvements via various optimizations including porting some components to C++
Various new experimental features including K=0
Improved documentation including a new version of the vignette.
Better error messages in several places
Experimental options for random projections with spectral initializations
Fixes a problem in make.heldout where a document could be completely emptied by the procedure. Hat tip to Jesse Rhodes for the bug report.
gamma.prior="L1" coerce the mu object back to a matrix class object. Should fix a speed hit introduced in 1.0.10 for this case.
Prevalence covariates can now use sparse matrices which will result in better performance for large factors.
textProcessor() and prepDocuments() now do a better job of preserving labels and keeping track of dropped elements. Special thanks to Github users gtuckerkellog and elbamos for pull requests.
Fixed an edge case in init.type="Spectral" where words appearing only in documents by themselves would throw an error. The error was correct but hard to address in certain cases, so now it temporarily removes the words and then reintroduces them before starting inference spreading a tiny bit of mass evenly across the topics. Hat tip to Nathan Sanders for brining this to our attention.
New function findTopic() which helps locate topics containing particular words or phrases.
New function topicLasso() helps build predictive models with topics.
Fixed a minor bug in prepDocuments which arises in cases where there are vocab elements which do not appear in the data.
Fixed a minor bug in frex calculation that caused some models not to label.
Fixed a minor bug in searchK that caused heldout results to report incorrectly.
Rewrite of plot.estimateEffect() which fixed a bug in some interaction models. Also returns results invisibly for creating custom plots.
Increased the stability of the spectral methods for stm initialization.
Complete rewrite of plotRemoved() which makes it much faster for larger datasets.
A minor patch to deal with textProcessor() in older versions of R.
Large changes many of which are not backwards compatible.
Numerous speed improvements to the core algorithm.
Introduction of several new options for the core stm function including spectral initalization, memoized inference, and model restarts.
Content covariate models are now estimated using the distributed multinomial formulation which is dramatically faster. Default prior also changed to L1.
Handling of document level convergence was changed to ensure positive definiteness in the document-level covariance matrices
Fixed bug in binary/binary interactions.
Numerous new diagnostic and summary functions
Expanding the console printing of many of the preprocessing functions
Fix an error with vignettes building on linux machines
sageLabels exported but not documented
factorCheck diagnostic function exported
Bug fix in the semantic Coherence function that affected content covariate models.
Bug fix to plot.STM() where for content covariate models with only a subset of topics requested the labels would show up as mostly NA. Thanks to Jetson Leder-Luis for pointing this out.
Bug fix for the readCorpus() function with txtorg vocab. Thanks to Justin Farrell for pointing this out.
Added some diagnostics to notify the user when words have been dropped in preprocessing.
Automatically coerce dates to numeric in spline function.
Very minor change with textProcessor() to accomodate API change in tm version 0.6
New option for plot.STM() which plots the distribution of theta values. Thanks to Antonio Coppola for coauthoring this component.
Deprecated option "custom" in "labeltype" of plot.STM(). Now you can simply specify the labels. Added additional functionality to specify custom topic names rather than the default "Topic #:"
Bug fixes to various portions of plot.STM() that would cause labels to not print.
Added numerous error messages.
Added permutationTest() function and associated plot capabilities
Updates to the vignette.
Added functionality to a few plotting functions.
When using summary() and labelTopics() content covariate models now have labels thresholded by a small value. Thus one may see no labels or very few labels particularly for topic-covariate interactions which indicates that there are no sizable positive deviations from the baseline.
S3 method for findThoughts and ability to threshold by theta.
Allow estimateEffect() to receive a data frame. (Thanks to Baoqiang Cao for pointing this out)
Major updates to the vignette
Minor Updates to several plotting functions
Fixed an error where labelTopics() would mislabel when passed topic numbers out of order (Thanks to Jetson Leder-Luis for pointing this out)
Introduction of the termitewriter function.
Version for submission to CRAN (2/28/2014)
Introduced new dataset poliblog5k and shrunk the footprint of the package
Numerous alternate options changed and some slight syntax changes to stm to finalize the API.
New build 2/14/2014
Fixing a small bug introduced in the last version which kept defaults of manyTopics() from working.
Updated version posted to Github (2/13/2014)
Various improvements to plotting functions.
Setting the seed in selectModel() threw an error. This is now corrected. Thanks to Mark Bell for pointing this out.
First public version released on Github (2/5/2014)
This is a beta release and we may change some of the API before submission to CRAN.