quanteda 0.9.9

Changes since v0.9.9-24

New features

Behaviour changes

Bug fixes

Changes since v0.9.9-17

New features

Bug fixes

Changes since v0.9.9-3

Bug fixes

New features

This release has some major changes to the API, described below.

Data objects

Renamed data objects

new name original name notes
data_char_sampletext exampleString
data_char_mobydick mobydickText
data_dfm_LBGexample LBGexample
data_char_sampletext exampleString

Renamed internal data objects

The following objects have been renamed, but will not affect user-level functionality because they are primarily internal. Their man pages have been moved to a common ?data-internal man page, hidden from the index, but linked from some of the functions that use them.

new name original name notes
data_int_syllables englishSyllables (used by textcount_syllables())
data_char_wordlists wordlists (used by readability())
data_char_stopwords .stopwords (used by stopwords()

Deprecated data objects

In v.0.9.9 the old names remain available, but are deprecated.

new name original name notes
data_char_ukimmig2010 ukimmigTexts
data_corpus_irishbudget2010 ie2010Corpus
data_char_inaugural inaugTexts
data_corpus_inaugural inaugCorpus

Deprecated functions

The following functions will still work, but issue a deprecation warning:

new function deprecated function contructs:
tokens tokenize() tokens class object
corpus_subset subset.corpus corpus class object
corpus_reshape changeunits corpus class object
corpus_sample sample corpus class object
corpus_segment segment corpus class object
dfm_compress compress dfm class object
dfm_lookup applyDictionary dfm class object
dfm_remove removeFeatures.dfm dfm class object
dfm_sample sample.dfm dfm class object
dfm_select selectFeatures.dfm dfm class object
dfm_smooth smoother dfm class object
dfm_sort sort.dfm dfm class object
dfm_trim trim.dfm dfm class object
dfm_weight weight dfm class object
textplot_wordcloud plot.dfm (plot)
textplot_xray plot.kwic (plot)
textstat_readability readability data.frame
textstat_lexdiv lexdiv data.frame
textstat_simil similarity dist
textstat_dist similarity dist
featnames features character
nsyllable syllables (named) integer
nscrabble scrabble (named) integer
tokens_ngrams ngrams tokens class object
tokens_skipgrams skipgrams tokens class object
tokens_toupper toUpper.tokens, toUpper.tokenizedTexts tokens, tokenizedTexts
tokens_tolower toLower.tokens, toLower.tokenizedTexts tokens, tokenizedTexts
char_toupper toUpper.character, toUpper.character character
char_tolower toLower.character, toLower.character character
tokens_compound joinTokens, phrasetotoken tokens class object

New functions

The following are new to v0.9.9 (and not associated with deprecated functions):

new function description ouput class
fcm() constructor for a feature co-occurrence matrix fcm
fcm_select selects features from an fcm fcm
fcm_remove removes features from an fcm fcm
fcm_sort sorts an fcm in alpahbetical order of its features fcm
fcm_compress compacts an fcm fcm
fcm_tolower lowercases the features of an fcm and compacts fcm
fcm_toupper uppercases the features of an fcm and compacts fcm
dfm_tolower lowercases the features of a dfm and compacts dfm
dfm_toupper uppercases the features of a dfm and compacts dfm
sequences experimental collocation detection sequences

Deleted functions and data objects

new name reason
encodedTextFiles.zip moved to the readtext package
describeTexts deprecated several versions ago for summary.character
textfile moved to package readtext
encodedTexts moved to package readtext, as data_char_encodedtexts
findSequences replaced by sequences

Other new features

quanteda 0.9.8

New Features

Bug fixes


quanteda 0.9.6

Bug fixes >= 0.9.6-3

Bug fixes

quanteda 0.9.4

Bug fixes

quanteda 0.9.2

Bug fixes

quanteda 0.9.0

Bug Fixes

quanteda 0.8.6

Bug fixes

quanteda 0.8.4

Bug fixes

quanteda 0.8.2

Bug Fixes


API changes

Imminent Changes

quanteda 0.8.0

Syntax changes and workflow streamlining

The workflow is now more logical and more streamlined, with a new workflow vignette as well as a design vignette explaining the principles behind the workflow and the commands that encourage this workflow. The document also details the development plans and things remaining to be done on the project.

Encoding detection and conversion

Newly rewritten command encoding() detects encoding for character, corpus, and corpusSource objects (created by textfile). When creating a corpus using corpus(), detection is automatic to UTF-8 if an encoding other than UTF-8, ASCII, or ISO-8859-1 is detected.

Major infrastructural changes

The tokenization, cleaning, lower-casing, and dfm construction functions now use the stringi package, based on the ICU library. This results not only in substantial speed improvements, but also more correctly handles Unicode characters and strings.

Other changes

Bug fixes

quanteda 0.7.3

quanteda 0.7.2

quanteda 0.7.1

Many major changes to the syntax in this version.

quanteda 0.7.0

quanteda 0.6.6

quanteda 0.6.5

quanteda 0.6.4

quanteda 0.6.3

quanteda 0.6.2

quanteda 0.6.1

quanteda 0.6.0

quanteda 0.5.8

Classification and scaling methods

quanteda 0.5.7

New arguments for dfm()

quanteda 0.5.6

quanteda 0.5.5

quanteda 0.5.4

quanteda 0.5.3

quanteda 0.5.2

quanteda 0.5.1

quanteda 0.5.0

Lots of new functions

Old functions vastly improved

Better object and class design

more complete documentation