Changes for R-package ndl Changes in version 0.1.6 (Nov 27th, 2012) (since version 0.1.1) Implementer: Antti Arppe * General: general modifications and fixes to several functions to check the integrity of input (mainly argument 'cuesOutcomes'), and otherwise with deal with diverse input formats with less hickups; new function 'predict.ndlClassify'; user-specifiable scalability for dealing with really large input files when using the C code method for the calculation of Cue-Cue and Cue-Outcome co-occurrences; maintainer e-mail address modified to a generic one. * acts2probs.R (acts2probs): code modified internally so that a minimum activation value of 'acts.minimum.correction=1e-10' is used instead when any activation value is exactly equal to 0 in input 'acts' (due to lack of any previously observed cues in input to e.g. 'estimateActivations'). * coocMatrices.c (cooc): C code fixed so that it will work with space characters ' ' in Cues and Outcomes (as input 'cuesOutcomes' to 'estimateWeights', 'cooccurrencesCues' and/or 'cooccurrencesCuesOutcomes') - due to this fix the 'Outcomes', 'Cues' and 'Frequency' columns are now separated by single tabulator characters '\t' (instead of spaces ' ') in the temporary text file: 'ndl_410025912.txt' through which data is passed to the C code from the 'estimateWeights', 'cooccurrencesCues' and 'cooccurencesCuesOutcomes' functions ; C code modified so that the sizes of Cues and Outcomes (and their combinations) can be specified flexibly by the user in 'estimateWeights', 'cooccurrencesCues' and/or 'cooccurrencesCuesOutcomes' beyond the default values (max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt'), which will allow the use of the C code alternative (with 'method="C"') via greater memory allocation for very large data sets on servers with (sufficient) greater capacity; C code modified so that upon encountering errors (not finding the parameter or data text files, or the data file exceeding 'max.lines'), which normally will all be detected by the calling R functions but can occur if the C code is called directly, an appropriate error message will be output to: 'ndl_err_410025912.txt' and the execution will end gracefully with 'return' (instead of 'exit'). * cooccurrencesCues.R (cooccurrencesCues): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes'can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); global option "ndl.estimateWeights" indicating call from 'estimateWeights' checked so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'. * cooccurrenceCuesOutcomes.R (cooccurrencesCuesOutcomes): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes'can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); ; global option "ndl.estimateWeights" indicating call from 'estimateWeights' checked so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'. * estimateActivations.R (estimateActivations): code and output values changed so that previously unseen Cues and Outcomes in input 'cuesOutcomes' are included in the output values, in addition to 'activationMatrix', as 'newCues' and 'newOutcomes', instead of these being printed to standard output; in the case of such unseen Cues or Outcomes a warning message is now output; code modified so that a warning message is output if potential accidental 'NA' strings (resulting from 'as.character(NA)') are detected among the input Cues and Outcomes in 'cuesOutcomes'; code will stop with an error message if any actual NA's are encountered among Cues and Outcomes in 'cuesOutcomes'; code modified so that a previous undocumented internal reference to 'WordForm' in input 'cuesOutcomes' is removed. * estimateWeights.R (estimateWeights): new arguments 'max.cues', 'max.characters' and 'max.lines' added and code modified so that the sizes of Cues and Outcomes in input 'cuesOutcomes' can be specified by the user beyond the default values (i.e. max.cues=20000, max.characters=20000, max.lines=500000; communicated via the temporary parameter file: 'ndl_par_410025912.txt' to the auxiliary C code alternative); code modified so that the function will stop with an error message if accidental empty cues (string-initial or string-final, or multiple string-medial underscores '_'), or NA's are detected in Cues or Outcomes in 'cuesOutcomes'; a warning message is output if potential accidental 'NA' strings (resulting from 'as.character(NA)') are detected among the input Cues and Outcomes in 'cuesOutcomes'; code modified internally so that Cues and Outcomes in 'cuesOutcomes' are coerced to character strings, in case they might be input as factors (which may happen by default when reading data with 'read.table' or 'read.csv' into R from an external plain text or CSV file as a data frame); global option "ndl.estimateWeights" specified as TRUE so that the auxiliary functions 'cooccurrenceCues' and 'cooccurrenceCuesOutcomes' will not rewrite the input text file: 'ndl_410025912.txt'; after successful execution, "ndl.estimateWeights" again specified as NULL. * ndlClassify.R (ndlClassify): code modified internally so that character length of response (Outcome) variable in 'formula' argument is not restricted. * ndlClassify.R (print.ndlClassify): code modified so that setting 'max.print=NA' will print the entire 'weightMatrix' matrix. * ndlCrossvalidate.R (ndlCrossvalidate): code modified internally so that the checks of the 'formula' and 'frequency' arguments do not produce superfluous warnings. * ndlCuesOutcomes.R (ndlCuesOutcomes): code modified internally so that character length of response (Outcome) variable in 'formula' argument is not restricted. * predict.ndlClassify.R (predict.ndlClassify): new function that allows for the estimation of activation-based probabilities and/or the prediction of Outcomes with new, unseen data, using the weights in 'weightMatrix' in an object previously fitted with 'ndlClassify'. * summary.ndlClassify.R (summary.ndlClassify): code modified so that setting 'max.print=NA' will print out entire 'weights' matrix. * Documents: Manual page contents modified to correspond to changes in R code; references in manual pages updated.