MaximinInfer is a package that implements the sampling and aggregation method for the covariate shift maximin effect, which was proposed in <arXiv:2011.07568>. It constructs the confidence interval for any linear combination of the high-dimensional maximin effect.
You can install the released version of MaximinInfer from CRAN with:
And the development version from GitHub with:
This is a basic example which shows you how to solve a common problem:
The data is heterogeneous and covariates shift between source and target data
set.seed(0)
## number of groups
L=2
## dimension
p=100
## mean vector for source
mean.source = rep(0, p)
## covariance matrix for source
A1gen <- function(rho,p){
A1=matrix(0,p,p)
for(i in 1:p){
for(j in 1:p){
A1[i,j]<-rho^(abs(i-j))
}
}
return(A1)
}
cov.source = A1gen(0.6, p)
## 1st group's source data
n1 = 100
X1 = MASS::mvrnorm(n1, mu=mean.source, Sigma=cov.source)
# true coef for 1st group
b1 = rep(0, p)
b1[1:5] = seq(1,5)/20
b1[98:100] = c(0.5, -0.5, -0.5)
Y1 = X1%*%b1 + rnorm(n1)
## 2nd group's source data
n2 = 100
X2 = MASS::mvrnorm(n2, mu=mean.source, Sigma=cov.source)
# true coef for 2nd group
b2 = rep(0, p)
b2[6:10] = seq(1,5)/20
b2[98:100] = 0.5*c(0.5, -0.5, -0.5)
Y2 = X2%*%b2 + rnorm(n2)
## Target Data, covariate shift
n.target = 100
mean.target = rep(0, p)
cov.target = cov.source
for(i in 1:p) cov.target[i, i] = 1.5
for(i in 1:5){
for(j in 1:5){
if(i!=j) cov.target[i, j] = 0.9
}
}
for(i in 99:100){
for(j in 99:100){
if(i!=j) cov.target[i, j] = 0.9
}
}
X.target = MASS::mvrnorm(n.target, mu=mean.target, Sigma=cov.target)
set.seed(0)
## loading
loading = rep(0, 100) # dimension p=100
loading[98:100] = 1
## call - use wrapper function
# mmInfer <- MaximinInfer(list(X1, X2), list(Y1, Y2), loading, X.target, covariate.shift = TRUE)
## call - separate steps
mm <- Maximin(list(X1, X2), list(Y1, Y2), loading, X.target, covariate.shift = TRUE)
mmInfer <- infer(mm)
Weights for groups
Point estimator for the linear contrast
Confidence Interval for point estimator
The default ridge penalty used is 0, if you want to make sure the estimator is more stable, we recommend adding a data-dependent penalty. The function below will help you tell whether zero penalty suffices to yield a stable estimator, if not, it will return a suggested penalty level.
We can measure instability for specific ridge penalty