Some Distributions Related to the Wishart

Geoffrey Thompson

2018-03-20

The stats implementation of rWishart is in C and is very fast. It is often the case that we do not want a sample from the Wishart distribution, but rather from the inverse of it or from the Cholesky decomposition of a sample from the Wishart distribution. Or even from the inverse of the Cholesky decomposition of a draw from the Wishart distribution. Funnily enough (if you have a weird sense of humor), when you inspect the source code for the rWishart distribution (R Core Team (2017)), it generates the Cholesky decomposition and then multiplies it out. Meanwhile, drawing from the rWishart and then inverting or doing a Cholesky decomposition or whatever in R is just slow – comparatively.

This suggests some obvious efficiencies: perhaps, if we would rather have the Cholesky decomposition of the Wishart random matrix, we could tell the function to stop right there.

library('CholWishart')
set.seed(20180220)
A <- stats::rWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
B <- rInvWishart(1,10,.2*diag(4))[,,1]
set.seed(20180220)
C <- rCholWishart(1,10,5*diag(4))[,,1]
set.seed(20180220)
D <- rInvCholWishart(1,10,.2*diag(4))[,,1]

How the Wishart distribution works

Suppose \(X_i \sim MVN(0, \Sigma)\) are independent \(p\)-variate normal random variables, \(i = 1, 2, \ldots n\) with \(n > p-1\). Then \(S = \sum X_i^T X_i\), called the “scatter matrix”, is almost surely positive definite if \(\Sigma\) is positive definite. The random variable \(S\) is said to be distributed as a Wishart random variable: \(S \sim W_p(n, \Sigma)\), see Gupta and Nagar (1999).

How does rWishart(n, df, Sigma) work (supposing Sigma is a \(p \times p\) matrix)? First, it generates a sample from the Cholesky decomposition of a Wishart distribution with \(\Sigma = \mathbf{I}_p\). How this is done: on the \(i^{th}\) element of the main diagonal, draw from \(\sqrt{\chi_{p-i+1}^2}\). On the upper triangle of the matrix, sample from an independent \(N(0,1)\) for each entry in the matrix. Then, this can be multiplied by the Cholesky decomposition of the provided Sigma to obtain the Cholesky factor of the desired sample from the Wishart random variable (this construction is due to Bartlett and is also known as the Bartlett Decomposition) (see Anderson (1984)). The rWishart function multiplies this out. Therefore, if the Cholesky decomposition is desired, one only needs to stop there.

The Inverse Wishart distribution

If \(X \sim \textrm{W}_p(\nu,\Sigma)\), then we define the Inverse Wishart as \(X^{-1} = Y \sim \textrm{IW}_p(\nu , \Sigma^{-1})\). There are other parameterizations of the distribution, mostly coming down to different ways of writing the \(\nu\) parameter - be aware of this when using any package drawing from the Inverse Wishart distribution (see Dawid (1981) for an alternative; this presentation follows Gupta and Nagar (1999)). This comes up directly in Bayesian statistics. We are also interested in the Cholesky decomposition of this, as it is required in the generation of the matrix variate \(t\)-distribution. In this package it is done by taking the covariance matrix, inverting it, computing the Cholesky decomposition of the inverted covariance matrix, drawing the Cholesky factor of a Wishart matrix using that, and then inverting based on that (as finding \(\Psi^{-1}\) given the Cholesky factorization of \(\Psi\) is relatively fast). This can then be converted into the Cholesky factor of the Inverse Wishart if that is what is desired. This would be slow to do in R, but in C it is not so bad.

Here is what happens with the results of the above:

A %*% B 
##               [,1]          [,2]          [,3]          [,4]
## [1,]  1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16  1.000000e+00  1.387779e-17  0.000000e+00
## [3,] -1.249001e-16  1.387779e-17  1.000000e+00 -5.551115e-17
## [4,]  1.110223e-16 -1.110223e-16  0.000000e+00  1.000000e+00
crossprod(C) %*% crossprod(D) # note: we do not expect C = D^-1, we expect this! 
##               [,1]          [,2]          [,3]          [,4]
## [1,]  1.000000e+00 -2.775558e-17 -2.081668e-16 -1.110223e-16
## [2,] -4.718448e-16  1.000000e+00  1.387779e-16  0.000000e+00
## [3,] -1.249001e-16  1.110223e-16  1.000000e+00 -1.110223e-16
## [4,]  1.110223e-16 -8.326673e-17 -8.326673e-17  1.000000e+00
crossprod(D) %*% A
##               [,1]          [,2]          [,3]          [,4]
## [1,]  1.000000e+00 -4.718448e-16 -1.249001e-16  1.110223e-16
## [2,] -2.775558e-17  1.000000e+00  1.110223e-16 -8.326673e-17
## [3,] -2.081668e-16  1.387779e-16  1.000000e+00 -8.326673e-17
## [4,] -1.110223e-16  0.000000e+00 -1.110223e-16  1.000000e+00
crossprod(C) %*% B
##               [,1]          [,2]          [,3]          [,4]
## [1,]  1.000000e+00 -2.775558e-17 -1.387779e-16 -5.551115e-17
## [2,] -4.718448e-16  1.000000e+00  1.387779e-17  0.000000e+00
## [3,] -1.249001e-16  1.387779e-17  1.000000e+00 -5.551115e-17
## [4,]  1.110223e-16 -1.110223e-16  0.000000e+00  1.000000e+00

There is some roundoff error.

Density computation

This package also has functions for density computations with the Wishart distribution. Densities are only defined for positive-definite input matrices and \(\nu\) parameters larger than the dimension \(p\).

The return value is on the log scale but it can be specified otherwise.

library('CholWishart')
dWishart(diag(3), df = 5, 5*diag(3))
## [1] -19.45038

dInvWishart(diag(3), df = 5, .2*diag(3))
## [1] -19.45038

Note that, in general, these will not agree even if their covariance matrix parameters are inverses of each other. One of the reasons this works is that the determinant of \(\mathbf{X}\) is \(1\).

The density functions can take 3-D array input indexed on the third dimension and will output a vector of densities.

set.seed(20180311)
A <- rWishart(n = 3, df = 3, Sigma = diag(3))
dWishart(A, df = 3, Sigma = diag(3))
## [1] -13.070275  -8.879220  -8.555529

Other functions

The multivariate gamma (\(\Gamma_p\)) and digamma (\(\psi_p\)) functions are extensions of the univariate gamma (\(\Gamma\)) and digamma (\(\psi\)) functions (Mardia, Bibby, and Kent (1982)). They are useful in calculating the densities above. They come up in other distributions as well. The digamma is the first derivative of the gamma function. When the dimension \(p = 1\), they coincide with the usual definitions of the digamma and gamma functions.

The multivariate gamma also comes in a logarithmic form (lmvgamma).

lmvgamma(1:4,1) # note how they agree when p = 1
## [1] 0.0000000 0.0000000 0.6931472 1.7917595
lgamma(1:4)
## [1] 0.0000000 0.0000000 0.6931472 1.7917595

References

Anderson, T. W., ed. 1984. An Introduction to Multivariate Statistical Analysis. Wiley.

Dawid, A. P. 1981. “Some Matrix-Variate Distribution Theory: Notational Considerations and a Bayesian Application.” Biometrika 68 (1). [Oxford University Press, Biometrika Trust]: 265–74. http://www.jstor.org/stable/2335827.

Gupta, A.K., and D.K. Nagar. 1999. Matrix Variate Distributions. Monographs and Surveys in Pure and Applied Mathematics. Taylor & Francis. https://books.google.com/books?id=PQOYnT7P1loC.

Mardia, K.V., J.M. Bibby, and J.T. Kent. 1982. Multivariate Analysis. Probability and Mathematical Statistics. Acad. Press. https://books.google.com/books?id=1nLonQEACAAJ.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.