General + Corrected several functions so that they accept both tibbles and data.frames.
Distance Calculation: + Adding new fitting procedures to the "gmm" method of findThreshold() that allows users to choose a mixture of two univariate density distribution functions among four available combinations: "norm-norm", "norm-gamma",
"gamma-norm", or "gamma-gamma". + Added the ability to choose the threshold selection criteria in the "gmm" method of findThreshold() from the best average sensitivity and specificity, the curve intersection or user defined sensitivity or specificity. + Renamed the cutEdge argument of findThreshold() to edge.
Mutation Profiling:
collapseClones(), adding various deterministic and stochastic methods to obtain effective clonal sequences, support for including ambiguous IUPAC characters in output, as well as extensive documentation. Removed calcClonalConsensus() from exported functions.observedMutations() and calcObservedMutations().calcObservedMutations() for sequences with non-triplet overhang at the tail.OBSERVED) and expected mutations (previously EXPECTED) returned by observedMutations() and expectedMutations() to MU_COUNT and MU_EXPECTED respectively.Selection Analysis:
calcBaseline() no longer calls collapseClones() automatically if a CLONE column is present. As indicated by the documentation for calcBaseline() users are advised to obtain effective clonal sequences (for example, calling collapseClones()) before running calcBaseline().calcBaseline().Mutation Profiling:
collapseClones() that prevented it from running when nproc is greater than 1.General:
Mutation Profiling:
collapseClones() that resulted in erroneous CLONAL_SEQUENCE and CLONAL_GERMLINE being returned.observedMutations was running.General:
Selection Analysis:
summarizeBaseline(). The returned p-value can now be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the seLicense chalection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection.editBaseline() to exported functions, and a corresponding section in the vignette.calcBaseline().Targeting Models:
numMutationsOnly argument to createSubstitutionMatrix(), enabling parameter tuning for minNumMutations.minNumMutationsTune() and minNumSeqMutationsTune() to tune for parameters minNumMutations and minNumSeqMutations in functions createSubstitutionMatrix() and createMutabilityMatrix() respectively. Also added function plotTune() which helps visualize parameter tuning using the abovementioned two new functions.HKL_S5F).HS5FModel as HH_S5F, MRS5NFModel as MK_RS5NF, and U5NModel as U5N.HH_S1F), human kappa and lambda light chain, silent, 1-mer, functional substitution model (HKL_S1F), and mouse kappa light chain, replacement and silent, 1-mer, non-functional substitution model (MK_RS1NF).makeDegenerate5merSub and makeDegenerate5merMut which make degenerate 5-mer substitution and mutability models respectively based on the 1-mer models. Also added makeAverage1merSub and makeAverage1merMut which make 1-mer substitution and mutability models respectively by averaging over the 5-mer models.Mutation Profiling:
returnRaw argument to calcObservedMutations(), which if true returns the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations (hence “raw”).slideWindowSeq() and slideWindowDb() which implement a sliding window approach towards filtering a single sequence or sequences in a data.frame which contain(s) equal to or more than a given number of mutations in a given number of consecutive nucleotides.slideWindowTune() which allows for parameter tuning for using slideWindowSeq() and slideWindowDb().slideWindowTunePlot() which visualizes parameter tuning by slideWindowTune().Distance Calculation:
distToNearest wherein normalize="length" for 5-mer models was resulting in distances normalized by junction length squared instead of raw junction length.distToNearest wherein symmetry="min" was calculating the minimum of the total distance between two sequences instead of the minimum distance at each mutated position.findThreshold function to infer clonal distance threshold from nearest neighbor distances returned by distToNearest.length option for the normalize argument of distToNearest to len so it matches Change-O.HS1FDistance and M1NDistance distance models, which have been renamed to hs1f_compat and m1n_compat in the model argument of distToNearest. These deprecated models should be used for compatibility with DefineClones in Change-O v0.3.3. These models have been replaced by replaced by hh_s1f and mk_rs1nf, which are supported by Change-O v0.3.4.hs5f model in distToNearest to hh_s5f.MK_RS5NF models to distToNearest.calcTargetingDistance() to enable calculation of a symmetric distance matrix given a 1-mer substitution matrix normalized by row, such as HH_S1F.findThreshold. The previous smoothed density method is available via the method="density" argument and the new GMM method is available via method="gmm".plotGmmThreshold and plotDensityThreshold to plot the threshold detection results from findThreshold for the "gmm" and "density" methods, respectively.Region Definition:
IMGT_V_NO_CDR3 and IMGT_V_BY_REGIONS_NO_CDR3. Updated IMGT_V and IMGT_V_BY_REGIONS so that neither includes CDR3 now.Selection Analysis:
Targeting Models:
numSeqMutationsOnly argument to createMutabilityMatrix(), enabling parameter tuning for minNumSeqMutations.General:
InfluenzaDb data object, in favor of the updated ExampleDb provided in alakazam 0.2.4.Distance Calculation:
cross argument to distToNearest() which allows restriction of distances to only distances across samples (ie, excludes within-sample distances).mst flag to distToNearest(), which will return all distances to neighboring nodes in a minimum spanning tree.aa model of distToNearest().aa model of distToNearest().Mutation Profiling:
MutationDefinition VOLUME_MUTATIONS.shmulateSeq() and shmulateTree() to simulate mutations on sequences and lineage trees, respectively, using a 5-mer targeting model.collapseByClone, calcDbExpectedMutations and calcDbObservedMutations to collapseClones, expectedMutations, and observedMutations, respectively.Selection Analysis:
Baseline object through groupBaseline() multiple times resulted in incorrect normalization.title options to plotBaselineSummary() and plotBaselineDensity().plotBaselineSummary() and plotBaselineDensity().testBaseline() function to test the significance of differences between two selection distributions.General:
InfluenzaDb.dplyr::tbl_df object instead of a data.frame.Distance Calculation:
distToNearest() did not return the nearest neighbor with a non-zero distance.Targeting Models:
createSubstitutionMatrix(),createMutabilityMatrix(), and plotMutability().plotMutability().Mutation Profiling:
MutationDefinition objects MUTATIONS_CHARGE, MUTATIONS_HYDROPATHY, MUTATIONS_POLARITY providing alternate approaches to defining replacement and silent annotations to mutations when calling calcDBObservedMutations() and calcDBExpectedMutations().regionDefinition=NULL consistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly.calcDBObservedMutations() returns R and S mutations also when regionDefinition=NULL. Older versions reported the sum of R and S mutations. The function will add the columns OBSERVED_SEQ_R and OBSERVED_SEQ_S when frequency=FALSE, and MU_FREQ_SEQ_R and MU_FREQ_SEQ_R when frequency=TRUE.General:
Distance Calculation:
symmetry parameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B.Mutation Profiling:
Selection Analysis:
Targeting Models:
minNumMutations parameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers.minNumSeqMutations parameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred.returnModel parameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model.returnSource parameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred.Initial public release.
General:
Influenza.tab file did not load on Mac OS X.citation("shazam") command.Distance Calculation:
HS1FDistance, based on the Yaari et al, 2013 data.hs1f as the default distance model for distToNearest().distToNearest().Mutation Profiling:
calcDBClonalConsensus() so that the function now works correctly when called with the argument collapseByClone=FALSE.frequency argument to calcObservedMutations() and calcDBObservedMutations(), which enables return of mutation frequencies rather the default of mutation counts.Targeting Models:
M3NModel and all options for using said model.createSubstitutionMatrix() and createMutabilityMatrix() where IMGT gaps were not being handled.General:
Targeting Models:
Targeting Models:
U5NModel, which is a uniform 5-mer model.plotMutability() output.Prerelease for review.