stcpR6 is an R package built to run nonparametric sequential tests and online change point detection algorithms in SRR 21’ and SRR 23’. This package supports APIs of nonparametric sequential test and online change-point detection for streams of univariate sub-Gaussian, binary, and bounded random variables.
Disclaimer: This R package is a personal project and is not affiliated with my company. It is provided “as is” without warranty of any kind, express or implied. The author does not assume any liability for any errors or omissions in this package.
You can install the CRAN version of stcpR6 with:
install.packages("stcpR6")
You can also install the development version of stcpR6 from GitHub with:
# install.packages("devtools")
::install_github("shinjaehyeok/stcpR6") devtools
We will use E-detector for bounded random variables with the following setup
<- 20
kCap <- 1000
kARL
<- function(y) {
normalize_obs <- (y + kCap) / (2 * kCap)
obs > 1] <- 1
obs[obs < 0] <- 0
obs[obs return(obs)
}
<- stcpR6::Stcp$new(
e_detector method = "SR",
family = "Bounded",
alternative = "two.sided",
threshold = log(kARL),
m_pre = normalize_obs(0),
delta_lower = 1 / kCap / 2
)
# Set random seed for reproducibility
set.seed(100)
# No change
<- 10
sig <- 200
v
<- seq(1,v) + rnorm(v, 0, sig)
y_history
# Simple regression predictor
<- function(i) {
predict_y if (i <= 2) return (0)
<- i-1
j <- lm.fit(x = cbind(rep(1, j),1:j), y = y_history[1:j])$coefficients
coeff return(coeff[1] + coeff[2] * i)
}
<- sapply(seq_along(y_history), predict_y)
y_hat
plot(seq_along(y_history), y_history, type = "l",
xlab = "t", ylab = "Y_t", main = "Sample history (Black) and Prediction (Red)")
lines(seq_along(y_hat), y_hat, col = 2)
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
Apply E-detector on normalized residuals
<- normalize_obs(as.numeric(y_history-y_hat))
res # In real applications,
# we should use e_detector$updateLogValues(res_i) to update e-deetector
# for each newly observed residual res_i.
# In this example, however, we use a vectorized update for simplicity.
# Initialize the model.
$reset()
e_detector# Compute the log values of e-detectors.
<- e_detector$updateAndReturnHistories(res)
log_values
# Print a summary of updates
print(e_detector)
#> stcp Model:
#> - Method: SR
#> - Family: Bounded
#> - Alternative: two.sided
#> - Alpha: 0.001
#> - m_pre: 0.5
#> - Num. of mixing components: 338
#> - Obs. have been passed: 200
#> - Current log value: 4.673126
#> - Is stopped before: FALSE
#> - Stopped time: 0
# Plot the log values and stopping threshold
plot(seq_along(log_values), log_values, type = "l",
xlab = "t", ylab = "log value", ylim = c(0, 3 * e_detector$getThreshold()))
abline(h = e_detector$getThreshold(), col = 2)
Note that the log values of e-detector have remained below the threshold. E-detector didn’t trigger any alert.
# Set random seed for reproducibility
set.seed(100)
# Change point: 100
<- 100
v
# Pre-change history
<- seq(1,v) + rnorm(v, 0, sig)
y_pre
# Post-change history
<- seq(v+1, 2*v) + 0.01 * seq(1,v)^2 + rnorm(v, 0, sig)
y_post <- c(y_pre, y_post)
y_history
# Sample history
<- c(y_pre, y_post)
y_history
<- sapply(seq_along(y_history), predict_y)
y_hat
plot(seq_along(y_history), y_history, type = "l",
xlab = "t", ylab = "Y_t", main = "Sample history (Black) and Prediction (Red)")
lines(seq_along(y_hat), y_hat, col = 2)
abline(v = v, lty = 2)
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
abline(v = v, lty = 2)
Apply E-detector on normalized residuals
<- normalize_obs(as.numeric(y_history-y_hat))
res # In real applications,
# we should use e_detector$updateLogValues(res_i) to update e-deetector
# for each newly observed residual res_i.
# In this example, however, we use a vectorized update for simplicity.
# Initialize the model.
$reset()
e_detector# Compute the log values of e-detectors.
<- e_detector$updateAndReturnHistories(res)
log_values
# Print a summary of updates
print(e_detector)
#> stcp Model:
#> - Method: SR
#> - Family: Bounded
#> - Alternative: two.sided
#> - Alpha: 0.001
#> - m_pre: 0.5
#> - Num. of mixing components: 338
#> - Obs. have been passed: 200
#> - Current log value: 44.92438
#> - Is stopped before: TRUE
#> - Stopped time: 138
# Plot the log values and stopping threshold
plot(seq_along(log_values), log_values, type = "l",
xlab = "t", ylab = "log value", ylim = c(0, 3 * e_detector$getThreshold()))
abline(h = e_detector$getThreshold() , col = 2)
abline(v = v, lty = 2)
abline(v = e_detector$getStoppedTime(), col = 2, lty = 2)
In this case, the log values have crossed the threshold at time 138 and triggered alert. Detection delay was 138 - 100 = 38.
Note that triggering alert at time 138 would be non-trivial if we only check the residual trend as below.
plot(seq_along(y_history), y_history-y_hat, type = "l",
xlab = "t", ylab = "Residual", main = "Prediction error")
abline(v = v, lty = 2)
abline(v = e_detector$getStoppedTime(), col = 2, lty = 2)