Title: | NPLS Regression with L1 Penalization |
---|---|
Description: | Tools for performing variable selection in three-way data using N-PLS in combination with L1 penalization, Selectivity Ratio and VIP scores. The N-PLS model (Rasmus Bro, 1996 <DOI:10.1002/(SICI)1099-128X(199601)10:1%3C47::AID-CEM400%3E3.0.CO;2-C>) is the natural extension of PLS (Partial Least Squares) to N-way structures, and tries to maximize the covariance between X and Y data arrays. The package also adds variable selection through L1 penalization, Selectivity Ratio and VIP scores. |
Authors: | David Hervas |
Maintainer: | David Hervas <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.40 |
Built: | 2024-10-25 02:59:59 UTC |
Source: | https://github.com/david-hervas/snpls |
AUC for a sNPLS-DA model
auroc(object)
auroc(object)
object |
A sNPLS object |
The area under the ROC curve for the model
Evaluation of ten bread with respect to eleven attributes by eight judges (Xbread). The outcome is the salt content of each bread (Ybread).
data(bread)
data(bread)
An object of class list
of length 2.
Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. PhD thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK).
Extract coefficients from a sNPLS model
## S3 method for class 'sNPLS' coef(object, as.matrix = FALSE, ...)
## S3 method for class 'sNPLS' coef(object, as.matrix = FALSE, ...)
object |
A sNPLS model fit |
as.matrix |
Should the coefficients be presented as matrix or vector? |
... |
Further arguments passed to |
A matrix (or vector) of coefficients
cv_snpls
Internal function for cv_snpls
cv_fit( xtrain, ytrain, xval, yval, ncomp, threshold_j = NULL, threshold_k = NULL, keepJ = NULL, keepK = NULL, method, metric, ... )
cv_fit( xtrain, ytrain, xval, yval, ncomp, threshold_j = NULL, threshold_k = NULL, keepJ = NULL, keepK = NULL, method, metric, ... )
xtrain |
A three-way training array |
ytrain |
A response training matrix |
xval |
A three-way test array |
yval |
A response test matrix |
ncomp |
Number of components for the sNPLS model |
threshold_j |
Threshold value on Wj. Scaled between [0, 1) |
threshold_k |
Threshold value on Wk. Scaled between [0, 1) |
keepJ |
Number of variables to keep for each component, ignored if threshold_j is provided |
keepK |
Number of 'times' to keep for each component, ignored if threshold_k is provided |
method |
Select between sNPLS, sNPLS-SR or sNPLS-VIP |
metric |
Performance metric (RMSE or AUC) |
... |
Further arguments passed to sNPLS |
Returns the CV root mean squared error or AUC
Performs cross-validation for a sNPLS model
cv_snpls( X_npls, Y_npls, ncomp = 1:3, samples = 20, threshold_j = c(0, 1), threshold_k = c(0, 1), keepJ = NULL, keepK = NULL, nfold = 10, parallel = TRUE, method = "sNPLS", metric = "RMSE", ... )
cv_snpls( X_npls, Y_npls, ncomp = 1:3, samples = 20, threshold_j = c(0, 1), threshold_k = c(0, 1), keepJ = NULL, keepK = NULL, nfold = 10, parallel = TRUE, method = "sNPLS", metric = "RMSE", ... )
X_npls |
A three-way array containing the predictors. |
Y_npls |
A matrix containing the response. |
ncomp |
A vector with the different number of components to test |
samples |
Number of samples for performing random search in continuous thresholding |
threshold_j |
Vector with threshold min and max values on Wj. Scaled between [0, 1) |
threshold_k |
Vector with threshold min and max values on Wk. Scaled between [0, 1) |
keepJ |
A vector with the different number of selected variables to test for discrete thresholding |
keepK |
A vector with the different number of selected 'times' to test for discrete thresholding |
nfold |
Number of folds for the cross-validation |
parallel |
Should the computations be performed in parallel? Set up strategy first with |
method |
Select between sNPLS, sNPLS-SR or sNPLS-VIP |
metric |
Select between RMSE or AUC (for 0/1 response) |
... |
Further arguments passed to sNPLS |
A list with the best parameters for the model and the CV error
## Not run: X_npls<-array(rpois(7500, 10), dim=c(50, 50, 3)) Y_npls<-matrix(2+0.4*X_npls[,5,1]+0.7*X_npls[,10,1]-0.9*X_npls[,15,1]+ 0.6*X_npls[,20,1]- 0.5*X_npls[,25,1]+rnorm(50), ncol=1) #Grid search for discrete thresholding cv1<- cv_snpls(X_npls, Y_npls, ncomp=1:2, keepJ = 1:3, keepK = 1:2, parallel = FALSE) #Random search for continuous thresholding cv2<- cv_snpls(X_npls, Y_npls, ncomp=1:2, samples=20, parallel = FALSE) ## End(Not run)
## Not run: X_npls<-array(rpois(7500, 10), dim=c(50, 50, 3)) Y_npls<-matrix(2+0.4*X_npls[,5,1]+0.7*X_npls[,10,1]-0.9*X_npls[,15,1]+ 0.6*X_npls[,20,1]- 0.5*X_npls[,25,1]+rnorm(50), ncol=1) #Grid search for discrete thresholding cv1<- cv_snpls(X_npls, Y_npls, ncomp=1:2, keepJ = 1:3, keepK = 1:2, parallel = FALSE) #Random search for continuous thresholding cv2<- cv_snpls(X_npls, Y_npls, ncomp=1:2, samples=20, parallel = FALSE) ## End(Not run)
Fitted method for sNPLS models
## S3 method for class 'sNPLS' fitted(object, ...)
## S3 method for class 'sNPLS' fitted(object, ...)
object |
A sNPLS model fit |
... |
Further arguments passed to |
Fitted values for the sNPLS model
Runs a genetic algorithm to select the best combination of hyperparameter values
ga_snpls( X, Y, ncomp = c(1, 3), threshold_j = c(0, 1), threshold_k = c(0, 1), maxiter = 20, popSize = 50, parallel = TRUE, replicates = 10, metric = "RMSE", method = "sNPLS", ... )
ga_snpls( X, Y, ncomp = c(1, 3), threshold_j = c(0, 1), threshold_k = c(0, 1), maxiter = 20, popSize = 50, parallel = TRUE, replicates = 10, metric = "RMSE", method = "sNPLS", ... )
X |
A three-way array containing the predictors. |
Y |
A matrix containing the response. |
ncomp |
A vector with the minimum and maximum number of components to assess |
threshold_j |
Vector with threshold min and max values on Wj. Scaled between [0, 1) |
threshold_k |
Vector with threshold min and max values on Wk. Scaled between [0, 1) |
maxiter |
Maximum number of iterations (generations) of the genetic algorithm |
popSize |
Population size (see |
parallel |
Should the computations be performed in parallel? (see |
replicates |
Number of replicates for the cross-validation performed in the fitness function of the genetic algoritm |
metric |
Select between RMSE or AUC (for 0/1 response) |
method |
Select between sNPLS, sNPLS-SR or sNPLS-VIP |
... |
Further arguments passed to |
A summary of the genetic algorithm results
plot.sNPLS
Internal function for plot.sNPLS
plot_T(x, comps, labels, group = NULL)
plot_T(x, comps, labels, group = NULL)
x |
A sNPLS model fit |
comps |
A vector of length two with the components to plot |
labels |
Should rownames be added as labels to the plot? |
group |
Vector with categorical variable defining groups |
A plot of the T matrix of a sNPLS model fit
plot.sNPLS
Internal function for plot.sNPLS
plot_time(x, comps)
plot_time(x, comps)
x |
A sNPLS model fit |
comps |
A vector with the components to plot |
A plot of Wk coefficients for each component
plot.sNPLS
Internal function for plot.sNPLS
plot_U(x, comps, labels, group = NULL)
plot_U(x, comps, labels, group = NULL)
x |
A sNPLS model fit |
comps |
A vector of length two with the components to plot |
labels |
Should rownames be added as labels to the plot? |
group |
Vector with categorical variable defining groups |
A plot of the U matrix of a sNPLS model fit
plot.sNPLS
Internal function for plot.sNPLS
plot_variables(x, comps)
plot_variables(x, comps)
x |
A sNPLS model fit |
comps |
A vector with the components to plot |
A plot of Wj coefficients for each component
plot.sNPLS
Internal function for plot.sNPLS
plot_Wj(x, comps, labels)
plot_Wj(x, comps, labels)
x |
A sNPLS model fit |
comps |
A vector of length two with the components to plot |
labels |
Should rownames be added as labels to the plot? |
A plot of Wj coefficients
plot.sNPLS
Internal function for plot.sNPLS
plot_Wk(x, comps, labels)
plot_Wk(x, comps, labels)
x |
A sNPLS model fit |
comps |
A vector of length two with the components to plot |
labels |
Should rownames be added as labels to the plot? |
A plot of the Wk coefficients
Plot function for visualization of cross validation results for sNPLS models
## S3 method for class 'cvsNPLS' plot(x, ...)
## S3 method for class 'cvsNPLS' plot(x, ...)
x |
A cv_sNPLS object |
... |
Not used |
A facet plot with the results of the cross validation
Plots a grid of slices from the 3-D kernel denity estimates of the repeat_cv function
## S3 method for class 'repeatcv' plot(x, ...)
## S3 method for class 'repeatcv' plot(x, ...)
x |
A repeatcv object |
... |
Further arguments passed to plot |
A grid of slices from a 3-D density plot of the results of the repeated cross-validation
Different plots for sNPLS model fits
## S3 method for class 'sNPLS' plot(x, type = "T", comps = c(1, 2), labels = TRUE, group = NULL, ...)
## S3 method for class 'sNPLS' plot(x, type = "T", comps = c(1, 2), labels = TRUE, group = NULL, ...)
x |
A sNPLS model fit |
type |
The type of plot. One of those: "T", "U", "Wj", "Wk", "time" or "variables" |
comps |
Vector with the components to plot. It can be of length |
labels |
Should rownames be added as labels to the plot? |
group |
Vector with categorical variable defining groups (optional) |
... |
Not used |
A plot of the type specified in the type
parameter
Predict function for sNPLS models
## S3 method for class 'sNPLS' predict(object, newX, rescale = TRUE, ...)
## S3 method for class 'sNPLS' predict(object, newX, rescale = TRUE, ...)
object |
A sNPLS model fit |
newX |
A three-way array containing the new data |
rescale |
Should the prediction be rescaled to the original scale? |
... |
Further arguments passed to |
A matrix with the predictions
Performs repeated cross-validatiodn and represents results in a plot
repeat_cv( X_npls, Y_npls, ncomp = 1:3, samples = 20, keepJ = NULL, keepK = NULL, threshold_j = c(0, 1), threshold_k = c(0, 1), nfold = 10, times = 30, parallel = TRUE, method = "sNPLS", metric = "RMSE", ... )
repeat_cv( X_npls, Y_npls, ncomp = 1:3, samples = 20, keepJ = NULL, keepK = NULL, threshold_j = c(0, 1), threshold_k = c(0, 1), nfold = 10, times = 30, parallel = TRUE, method = "sNPLS", metric = "RMSE", ... )
X_npls |
A three-way array containing the predictors. |
Y_npls |
A matrix containing the response. |
ncomp |
A vector with the different number of components to test |
samples |
Number of samples for performing random search in continuous thresholding |
keepJ |
A vector with the different number of selected variables to test in discrete thresholding |
keepK |
A vector with the different number of selected 'times' to test in discrete thresholding |
threshold_j |
Vector with threshold min and max values on Wj. Scaled between [0, 1) |
threshold_k |
Vector with threshold min and max values on Wk. Scaled between [0, 1) |
nfold |
Number of folds for the cross-validation |
times |
Number of repetitions of the cross-validation |
parallel |
Should the computations be performed in parallel? Set up strategy first with |
method |
Select between sNPLS, sNPLS-SR or sNPLS-VIP |
metric |
Select between RMSE or AUC (for 0/1 response) |
... |
Further arguments passed to cv_snpls |
A density plot with the results of the cross-validation and an (invisible) data.frame
with these results
Builds the R-matrix from a sNPLS model fit
Rmatrix(x)
Rmatrix(x)
x |
A sNPLS model obtained from |
Returns the R-matrix of the model, needed to compute the coefficients
Fits a N-PLS regression model imposing sparsity on wj
and wk
matrices
sNPLS( XN, Y, ncomp = 2, threshold_j = 0.5, threshold_k = 0.5, keepJ = NULL, keepK = NULL, scale.X = TRUE, center.X = TRUE, scale.Y = TRUE, center.Y = TRUE, conver = 1e-16, max.iteration = 10000, silent = F, method = "sNPLS" )
sNPLS( XN, Y, ncomp = 2, threshold_j = 0.5, threshold_k = 0.5, keepJ = NULL, keepK = NULL, scale.X = TRUE, center.X = TRUE, scale.Y = TRUE, center.Y = TRUE, conver = 1e-16, max.iteration = 10000, silent = F, method = "sNPLS" )
XN |
A three-way array containing the predictors. |
Y |
A matrix containing the response. |
ncomp |
Number of components in the projection |
threshold_j |
Threshold value on Wj. Scaled between [0, 1) |
threshold_k |
Threshold value on Wk. scaled between [0, 1) |
keepJ |
Number of variables to keep for each component, ignored if threshold_j is provided |
keepK |
Number of 'times' to keep for each component, ignored if threshold_k is provided |
scale.X |
Perform unit variance scaling on X? |
center.X |
Perform mean centering on X? |
scale.Y |
Perform unit variance scaling on Y? |
center.Y |
Perform mean centering on Y? |
conver |
Convergence criterion |
max.iteration |
Maximum number of iterations |
silent |
Show output? |
method |
Select between L1 penalization (sNPLS), variable selection with Selectivity Ratio (sNPLS-SR) or variable selection with VIP (sNPLS-VIP) |
A fitted sNPLS model
C. A. Andersson and R. Bro. The N-way Toolbox for MATLAB Chemometrics & Intelligent Laboratory Systems. 52 (1):1-4, 2000.
Hervas, D. Prats-Montalban, J. M., Garcia-CaƱaveras, J. C., Lahoz, A., & Ferrer, A. (2019). Sparse N-way partial least squares by L1-penalization. Chemometrics and Intelligent Laboratory Systems, 185, 85-91.
X_npls<-array(rpois(7500, 10), dim=c(50, 50, 3)) Y_npls <- matrix(2+0.4*X_npls[,5,1]+0.7*X_npls[,10,1]-0.9*X_npls[,15,1]+ 0.6*X_npls[,20,1]- 0.5*X_npls[,25,1]+rnorm(50), ncol=1) #Discrete thresholding fit <- sNPLS(X_npls, Y_npls, ncomp=3, keepJ = rep(2,3) , keepK = rep(1,3)) #Continuous thresholding fit2 <- sNPLS(X_npls, Y_npls, ncomp=3, threshold_j=0.5, threshold_k=0.5) #USe sNPLS-SR method fit3 <- sNPLS(X_npls, Y_npls, ncomp=3, threshold_j=0.5, threshold_k=0.5, method="sNPLS-SR")
X_npls<-array(rpois(7500, 10), dim=c(50, 50, 3)) Y_npls <- matrix(2+0.4*X_npls[,5,1]+0.7*X_npls[,10,1]-0.9*X_npls[,15,1]+ 0.6*X_npls[,20,1]- 0.5*X_npls[,25,1]+rnorm(50), ncol=1) #Discrete thresholding fit <- sNPLS(X_npls, Y_npls, ncomp=3, keepJ = rep(2,3) , keepK = rep(1,3)) #Continuous thresholding fit2 <- sNPLS(X_npls, Y_npls, ncomp=3, threshold_j=0.5, threshold_k=0.5) #USe sNPLS-SR method fit3 <- sNPLS(X_npls, Y_npls, ncomp=3, threshold_j=0.5, threshold_k=0.5, method="sNPLS-SR")
Estimates Selectivity Ratio for the different components of a sNPLS model fit
SR(model)
SR(model)
model |
A sNPLS model |
A list of data.frames, each of them including the computed Selectivity Ratios for each variable
Summary of a sNPLS model fit
## S3 method for class 'sNPLS' summary(object, ...)
## S3 method for class 'sNPLS' summary(object, ...)
object |
A sNPLS object |
... |
Further arguments passed to summary.default |
A summary inclunding number of components, squared error and coefficients of the fitted model
Unfolds a three-way array into a matrix
unfold3w(x)
unfold3w(x)
x |
A three-way array |
Returns a matrix with dimensions dim(x)[1] x dim(x)[2]*dim(x([3]))