This is the `R`

-package accompanying the paper Proximal Methods for Sparse
Optimal Scoring and Discriminant Analysis.

This package is under continuous development and most of the basic
functionality is available! You can now apply sparse discriminant
analysis with `accSDA`

, see the tutorials below to get
started.

Do you have a data set with a **lot of variables and few
samples**? Do you have **labels** for the data?

Then you might be trying to solve an *p>>n*
classification task.

This package includes functions that allow you to train such a
classifier in a sparse manner. In this context *sparse* means
that only the best variables are selected for the final classifier. In
this sense you can also interpret the output, i.e. use it to identify
important variables for your classification task. The current functions
also handle cross-validation for tuning the sparsity, look at the
documentation for further description/examples.

You can install the package from CRAN or for the development version, you can install directly from github.

To install directly from CRAN simply type the following into your R console:

`install.packages("accSDA")`

This should be enough for most users!

You can also install the development version of the package from github:

```
if (!"devtools" %in% installed.packages()[,1]) {
install.packages("devtools")
}::install_github("gumeo/accSDA") devtools
```

The following is a simple example on how one could use the package on
Fisher’s Iris dataset. I choose the Iris dataset because most people are
familiar with it. Check also the *p>>n* example below!

```
# Prepare training and test set
<- c(1:40,51:90,101:140)
train <- iris[train,1:4]
Xtrain
# normalize is a function in the package
<- normalize(Xtrain)
nX <- nX$Xc
Xtrain <- iris[train,5]
Ytrain <- iris[-train,1:4]
Xtest <- normalizetest(Xtest,nX)
Xtest <- iris[-train,5]
Ytest
# Define parameters for SDAD, i.e. ADMM optimization method
# Also try the SDAP and SDAAP methods, look at the documentation
# to read more about the parameters!
<- diag(4)+0.1*matrix(1,4,4) #elNet coef mat
Om <- 0.01
gam <- 0.01
lam <- "SDAD"
method <- 2
q <- list(PGsteps = 100,
control PGtol = c(1e-5,1e-5),
mu = 1,
maxits = 100,
tol = 1e-3,
quiet = FALSE)
# Run the algorithm
<- ASDA(Xt = Xtrain,
res Yt = Ytrain,
Om = Om,
gam = gam ,
lam = lam,
q = q,
method = method,
control = control)
# Can also just use the defaults:
# Default optimization method is SDAAP, accelerated proximal gradient.
<- ASDA(Xtrain,Ytrain) resDef
```

Now that you have gotten some results, you want to test the
performance on the test set! What comes out of the `ASDA`

function is an S3 object of class `ASDA`

and there is a
predict method in the package to predict the outcome of the classifier
on new data!

`<- predict(res, newdata = Xtest) preds `

The Iris data is not very convincing, so let’s take a look at a simulated example that is more relevant w.r.t. sparsity.

The goal here is to demonstrate how we setup the data, how we normalize, train, predict, plot projected data and assess accuracy.

Try running the following code in an R-session with
`accSDA`

loaded.

```
# You can play around with these parameters
<- 300 # Number of variables
P <- 100 # Number of samples per class
N <- 5 # Number of classes
K
# Mean for classes, they are zero everywhere except that coordinate i has
# value 3 for class i, each column of the means matrix represents a mean
# for a specific class.
<- matrix(0,nrow=P,ncol=K)
means for(i in 1:K){
<- 5
means[i,i]
}
# Sample dummy data
<- matrix(0,nrow=0,ncol=P)
Xtrain <- matrix(0,nrow=0,ncol=P)
Xtest for(i in 1:K){
<- rbind(Xtrain,MASS::mvrnorm(n=N,mu = means[,i], Sigma = diag(P)))
Xtrain <- rbind(Xtest,MASS::mvrnorm(n=N,mu = means[,i], Sigma = diag(P)))
Xtest
}
# Generate the labels
<- factor(rep(1:K,each=N))
Ytrain <- Ytrain
Ytest
# Normalize the data
<- accSDA::normalize(Xtrain)
Xt <- Xt$Xc # Use the centered and scaled data
Xtrain <- accSDA::normalizetest(Xtest,Xt)
Xtest
# Train the classifier and increase the sparsity parameter from the default
# so we penalize more for non-sparse solutions.
<- accSDA::ASDA(Xtrain,Ytrain,lam=0.01)
res
# Plot the projected training data, it is projected to the first
# 2-dimensions, this is possible because the number of discriminant
# vectors is the number of classes minus 1. So for a binary
# Classification task, we project to one dimension, and can't produce
# a plot like this.
<- Xtrain%*%res$beta
XtrainProjected
# Plot using the first two discriminant directions
plot(XtrainProjected[,1],XtrainProjected[,2],col=Ytrain,
main='Training data projected with discriminant vectors')
```

```
# Predict on the test data
<- predict(res, newdata = Xtest)
preds
# Plot projected test data with predicted and correct labels
<- Xtest%*%res$beta
XtestProjected
plot(XtestProjected[,1],XtestProjected[,2],col=Ytest,
main="Projected test data with original labels")
plot(XtestProjected[,1],XtestProjected[,2],col=preds$class,
main="Projected test data with predicted labels")
# Calculate accuracy
sum(preds$class == Ytest)/(K*N) # We have N samples per class, so total K*N
# Inspect the res$beta vector to see that the discriminant vector is sparse
```

Coming releases will include more plotting and printing functionality
for the `ASDA`

objects. A C++ backend is also in the pipeline
along with some further extensions to handle different types of
data.