2022-05-24 Version 1.2-9
* Modifications to the package to conform with current CRAN requirements.
Also updated url's
2015-05-08 Version 1.2-8
* Modifications to the package so that it works with upcoming R changes to
nchar(). Also updates to email addresses and url's.
2014-08-25 Version 1.2-7
* Moved vignettes to directory vignettes, as required by CRAN
2012-9-20 Version 1.2-6
* Changes to the package so that only public R functions are now used.
2012-04-05 Version 1.2-5
2010-7-5 Version 1.2-4
* hapassoc() now handles the case of all haplotypes being included in
the model formula. If 'baseline' haplotype is given, it will be the
reference group. If not, the most frequent haplotype will be the
reference group.
2009-11-19 Version 1.2-3
* fixed bug: hapassoc() failed when using the design="cc" option with no
non-genetic variables in the dataset.
2008-10-03 Version 1.2-2
This is a bug-fix release. Version 1.2-1 of the package was missing
one of the C source files and would crash for case-control data
(design="cc"). For prospective or cross-sectional data, the Poisson
and Gamma log-likelihood functions used an undefined indexing variable;
this bug has now been fixed.
2008-07-19 Version 1.2-1
Hapassoc was originally developed to provide likelihood inference for
prospectively collected (cohort) or cross-sectional data. We have
made some provisions to accommodate case-control data (with future
plans for more) by including an implementation of association and haplotype
frequency estimators arising from the modified prospective score equations
(MPSE) of Spinka et al. (2005). The implementation is due to Chen 2006.
A summary of changes to the package follows.
Documentation
-------------
- Reduced emphasis on likelihood inference because hapassoc now
implements association estimators for case-control data from an
unbiased estimating equation approach (Spinka et al. 2005).
- Added documentation of two new arguments to hapassoc:
* design can either be "cohort" (default) for cohort or cross-sectional
data or "cc" for case-control
* disease.prob specifies the marginal probability of disease for
the case-control approach. See the Details section of the hapassoc
documentation for more.
- Revised the "Details" section of the hapassoc help file. Added the
following:
"When the study design is case-control, i.e. genotypes and
non-genetic attributes have been sampled retrospectively given disease
status, naive application of prospective maximum likelihood methods
can yield biased inference (Spinka et al., 2005, Chen, 2006).
Therefore, when \code{design="cc"}, the algorithm solves the modified
prospective score equations or MPSE (Spinka et al. 2005) for regression
and haplotype frequency parameters. The implementation in \pkg{hapassoc}
is due to Chen (2006). In general, the MPSE approach requires that
the marginal probability of disease, P(D=1), be known.
An exception is when the disease is rare; hence, when
\code{disease.prob=NULL} (the default) a rare disease is assumed.
The variance-covariance matrix of the regression parameter and
haplotype frequency estimators is approximated as described
in Chen (2006). Limited simulations indicate that the resulting
standard errors for regression parameters perform well, but not the
standard errors for haplotype frequencies, which should be ignored.
For case-control data, we hope to implement the variance-covariance
estimator of Spinka et al. (2005) in a future version of \pkg{hapassoc}."
Source files
------------
In DESCRIPTION:
- Noted Zhijian Chen's contribution
In R/hapassoc.R:
- Added error check to make sure family=binomial() if design="cc"
- Changed code that handles formulae of the form "y ~ ." Previously,
when hapassoc was passed such a formula it dropped the baseline haplotype
from the design matrix, which has the effect of using the baseline haplotype
as the baseline in all calls to glm. However, the MPSE code needs all
haplotypes to be present in the data frame. The modified code paste()s
together the appropriate formula; for example, if there are 2 SNPs and
h00 is the baseline haplotype, and there are no non-genetic
covariates, paste together the formula y ~ h01 + h10 + h11.
None of the columns in the design matrix are dropped.
- Where there used to be just a single while loop to do the EM algorithm
there is now an if-else. If design="cc", do the MPSE code, else
do the regular hapassoc code. In the "if", there are a lot of
preliminary calculations that need to be done (about 75 lines of code)
before the while loop.
- The output item "dispersionML" has been renamed dispersion. This required
some changes in the log-likelihood code too.
- The outut item "loglik" is set to NA when design="cc"; the MPSE are
estimating equations, so this does not apply.
- Added utility functions r.Omega() and get.diplofreq() to the hapassoc.R
source file. These are used in the MPSE code.
In src/tapply_sum.c:
- New file that contains the function tapply_sum written by S. Blay to
speed up the MPSE code.
Future work
-----------
- The Spinka et al. (2005) variance calculation for case-control data has
not yet been implemented. Currently, an approximation shown by
Chen (2006) to have reasonably good properties is used.
References
----------
Chen, Z. (2006): Approximate likelihood inference for haplotype risks
in case-control studies of a rare disease, Masters thesis, Statistics
and Actuarial Science, Simon Fraser University, available at
\url{http://www.stat.sfu.ca/people/alumni/Theses/Chen-2006.pdf}.
Spinka, C., Carroll, R. J. & Chatterjee, N. (2005). Analysis of
case-control studies of genetic and environmental factors with missing
genetic information and haplotype-phase ambiguity.
Genetic Epidemiology, \bold{29}, 108-127.
2006-07-20 Version 1.1
* fixed bug to enable formula = dependent ~ 1 in hapassoc()
and summary.hapassoc()
* hapassoc() now also returns the function call
* summary.hapassoc() now also returns the hapassoc function call,
the number of subjects used in the analysis, the name of the
family and the log-likelihood. The returned object is printed nicely.
2006-04-28 Version 1.0-1
* added the following second citation:
Burkett K, Graham J and McNeney B (2006). hapassoc: Software for
Likelihood Inference of Trait Associations with SNP Haplotypes and Other
Attributes. Journal of Statistical Software, 16(2), 1-19
2006-04-04 Version 1.0
* hapassoc(): the "baseline" argument is documented to have default equal to
the most common haplotype, but the code to implement this default was
lost and needed to be replaced.
* hapassoc(): added a "verbose" flag. Default is verbose=FALSE. If TRUE users
see the iteration number and value of the convergence criterion at each
iteration of the EM algorithm.
* pre.hapassoc(): added a "verbose" flag. Default is verbose=TRUE. If TRUE
users see a list of the SNP genotypes used to form haplotypes and a list
of the other "non-haplotype" variables
* Package vignette "hapassoc" added. After loading the package, type
vignette("hapassoc") to view.
2006-03-22
* Overall addition of the log-likelihood functions
* hapassoc(): function now returns log-likelihood and model
* logLik.hapassoc(): New function to extract the log-likelihood
from a hapassoc object
* anova.hapassoc(): New function to perform likelihood ratio test on
two hapassoc objects.
2006-02-02 Minor changes:
* EMvar(): fixed a bug occurring when all haplotype phases are known.
* RecodeHaplos(): fixed a bug where a single column of non-haplotype data
in a non-allelic data set was losing its name.
* hapassoc(): Change "..." argument of hapassoc to "start". Previously the
only intended use of "..." was to allow the user to pass in "start" for
starting values to the glm function, rather than to allow the user to pass
in other optional arguments to glm. We have now made this more explicit by
making this argument more specific.
2005-07-13 Version 0.7-1
* handleMissings(), pre.hapassoc(): instead of casting to a
data.frame use indexing argument drop=FALSE.
2005-06-30 Version 0.7
* hapassoc(): use initial weights in the glm to get initial parameter estimates.
* hapassoc(), pre.hapassoc(): replaced weights calcuation with a C function.
Speed up computing time for large data sets.
2005-05-31 Version 0.6-2
* handleMissings(): fixed a bug for SNPs with a rare allele that is in no
instance in a homozygous state.
2005-05-09 Version 0.6-1
* Added this ChangeLog file
* Added inst/CITATION file
2005-04-06 Version 0.6
* RecodeHaplos(): Allow input SNP data as alphabetic alleles (e.g.A,G,C,T),
and for genotypes to be input either in a single two-character column
("genotypic format"), or as a pair of columns (the original "allelic
format" from earlier versions of hapassoc)
* pre.hapassoc(): Added allelic argument to indicate whether SNP data are
input in either genotypic fromat or allelic format
* RecodeHaplos(): Check for the number of alleles at each locus. If the
check finds loci with >2 alleles, stop execution and print an error
message that tells the user that only diallelic loci are allowed.
* RecodeHaplos(): Convert all missing data in "" format to NA.
* happasoc(): The convergence criteria for the EM algorithm has been
tightened. We now require both absolute and relative changes in the
parameter estimates from one iteration to the next to be below the
user-specified tolerance.
* summary.hapassoc(): Added a check for converged FALSE, and now print a
warning (used to just give a cryptic error).
* happasoc(): Changed the name of the variable 'gamma' to 'freq' and the
name of the variable 'initGamma' to initFreq'
* pre.hapassoc(): Changed the name of the returned variable 'initGamma' to
'initFreq'
pre.hapassoc man page
* pre.hapassoc(): Added a new example of how to use pre.hapassoc with SNPs
in the new genotypic format
* Added documentation to describe how single-locus genotypes may now be
specified as a single two-character column ("genotypic format") in the
input data frame
* Added a Note to alert users to ignore the possible warnings related to
row.names being duplicated when there are missing genotypes on some of the
loci for an individual
* Updated the reference to Burkett et al. to give journal volume and page
numbers
hapassoc man page
* Added a Note to alert users to the warning they'll see when fitting
logistic regression models (non-integer #successes...)
* Added more comments to the examples to make the coding of columns in
haploDM more obvious, and added an example of a non-multiplicative
logistic regression model
* Updated the reference to Burkett et al.
summary.hapassoc man page
* Updated the reference to Burkett et al.
2004-11-03 Version 0.5-1
* handleMissings() assignment to nonSNPdat changed to accomodate rbind of
data frames which contain factors.
* hapassoc(): Assigned response<-regr$y
We used to use model.response to extract the response variable, but this
lead to problems in calculating the residuals in the pYgivenX function.
It is better to fit the model with the glm function (see code) and then
extract the response from the fitted model object
2004-09-29 Version 0.5
* Changed file name from PreEM.R to PreHap.R
* Changed function name from EM to hapassoc
* Changed function name from PreEM to pre.hapassoc
* changed function name from summary.EM to summary.hapassoc