Title: | Consensus OPLS for Multi-Block Data Fusion |
---|---|
Description: | Merging data from multiple sources is a relevant approach for comprehensively evaluating complex systems. However, the inherent problems encountered when analyzing single tables are amplified with the generation of multi-block datasets, and finding the relationships between data layers of increasing complexity constitutes a challenging task. For that purpose, a generic methodology is proposed by combining the strength of established data analysis strategies, i.e. multi-block approaches and the Orthogonal Partial Least Squares (OPLS) framework to provide an efficient tool for the fusion of data obtained from multiple sources. The package enables quick and efficient implementation of the consensus OPLS model for any horizontal multi-block data structures (observation-based matching). Moreover, it offers an interesting range of metrics and graphics to help to determine the optimal number of components and check the validity of the model through permutation tests. Interpretation tools include score and loading plots, Variable Importance in Projection (VIP), functionality predict for SHAP computing, and performance coefficients such as R2, Q2, and DQ2 coefficients. J. Boccard and D.N. Rutledge (2013) <doi:10.1016/j.aca.2013.01.022>. |
Authors: | Celine Bougel [aut] |
Maintainer: | Van Du T. Tran <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2025-02-27 09:52:53 UTC |
Source: | https://github.com/cran/ConsensusOPLS |
Merging data from multiple sources is a relevant approach for comprehensively evaluating complex systems. However, the inherent problems encountered when analyzing single tables are amplified with the generation of multi-block datasets, and finding the relationships between data layers of increasing complexity constitutes a challenging task. For that purpose, a generic methodology is proposed by combining the strengths of established data analysis strategies, i.e. multi-block approaches and the OPLS framework to provide an efficient tool for the fusion of data obtained from multiple sources. The package enables quick and efficient implementation of the consensus OPLS model for any horizontal multi-block data structure (observation-based matching). Moreover, it offers an interesting range of metrics and graphics to help to determine the optimal number of components and check the validity of the model through permutation tests. Interpretation tools include scores and loadings plots, as well as Variable Importance in Projection (VIP), and performance coefficients such as R2, Q2 and DQ2 coefficients.
This package uses functions from the K-OPLS package, developed by Max Bylesjo, University of Umea, Judy Fonville and Mattias Rantalainen, Imperial College.
Copyright (c) 2007-2010 Max Bylesjo, Judy Fonville and Mattias Rantalainen
This code has been extended and adapted under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
Maintainer: Van Du T. Tran [email protected] (ORCID)
Authors:
Celine Bougel [email protected] (ORCID)
Julien Boccard [email protected] (ORCID)
Florence Mehl [email protected] (ORCID)
Other contributors:
Marie Tremblay-Franco [email protected] (ORCID) [funder]
Mark Ibberson [email protected] (ORCID) [funder]
Constructs the consensus OPLS model with the optimal number of orthogonal components for given data blocks and response, and evaluate the model quality w.r.t other models built with randomly permuted responses.
ConsensusOPLS( data, Y, maxPcomp = 1, maxOcomp = 5, modelType = "da", nperm = 100, cvType = "nfold", nfold = 5, nMC = 100, cvFrac = 4/5, kernelParams = list(type = "p", params = c(order = 1)), mc.cores = 1, verbose = FALSE )
ConsensusOPLS( data, Y, maxPcomp = 1, maxOcomp = 5, modelType = "da", nperm = 100, cvType = "nfold", nfold = 5, nMC = 100, cvFrac = 4/5, kernelParams = list(type = "p", params = c(order = 1)), mc.cores = 1, verbose = FALSE )
data |
A list of data blocks. Each element of the list must be of matrix type. Rows and columns can be identified (names), in which case this will be retained during analysis. Any pre-processing of the data (e.g. scaling) must be carried out before building the model. |
Y |
A vector, factor, dummy matrix or numeric matrix for the response. The type of answer given will condition the model to be used: a numeric vector for linear regression, a factor or dummy matrix for logistic regression or a discriminant model. |
maxPcomp |
Maximum number of Y-predictive components used to build the optimal model. Default, 1. |
maxOcomp |
Maximum number of Y-orthogonal components used to build the optimal model. Default, 5. |
modelType |
String for type of OPLS regression model, either |
nperm |
Number of random permutations desired in response Y. Default, 100. |
cvType |
String for type of cross-validation used. Either |
nfold |
Number of folds performed in n-fold cross-validation. This can be set to the number of samples to perform Leave-One-Out cross validation. Default, 5. |
nMC |
An integer indicating the number of rounds performed when
|
cvFrac |
A numeric value indicating the fraction of observations from
|
kernelParams |
List of parameters for the kernel. Either |
mc.cores |
Number of cores for parallel computing. Default, 1. |
verbose |
A logical indicating if detailed information (cross validation) will be shown. Default, FALSE. |
An object of class ConsensusOPLS
representing the consensus
OPLS model fit.
data(demo_3_Omics) datablocks <- lapply( demo_3_Omics[c("MetaboData", "MicroData", "ProteoData")], scale) res <- ConsensusOPLS(data=datablocks, Y=demo_3_Omics$Y, maxPcomp=1, maxOcomp=2, modelType='da', nperm=5) res
data(demo_3_Omics) datablocks <- lapply( demo_3_Omics[c("MetaboData", "MicroData", "ProteoData")], scale) res <- ConsensusOPLS(data=datablocks, Y=demo_3_Omics$Y, maxPcomp=1, maxOcomp=2, modelType='da', nperm=5) res
ConsensusOPLS
S4 classAn object returned by the ConsensusOPLS
function,
of class ConsensusOPLS
, and representing a fitted Consensus OPLS
model.
modelType
The type of requested OPLS regression model.
response
The provided response variable (Y).
nPcomp
Number of Y-predictive components (latent variables) of the optimal model.
nOcomp
Number of Y-orthogonal components (latent variables) of the optimal model.
blockContribution
Relative contribution of each block (normalized
lambda
values) to the latent variables.
scores
Representation of the samples in the latent variables of the optimal model.
loadings
Contribution of each block's variables to the latent variables of the optimal model.
VIP
Variable importance in projection (VIP) for each block of data, assessing the relevance of the variables in explaining the variation in the response.
R2X
Proportion of variation in data blocks explained by the optimal model.
R2Y
Proportion of variation in the response explained by the optimal model.
Q2
Predictive ability of the optimal model.
DQ2
Predictive ability of the optimal discriminant model.
permStats
Assessment of models with permuted response.
model
The optimal model.
cv
Cross-validation result towards the optimal model. Contains
AllYhat
(all predicted Y values as a concatenated matrix),
cvTestIndex
(indexes for the test set observations during the
cross-validation rounds), DQ2Yhat
(total discriminant Q-square result
for all Y-orthogonal components), nOcompOpt
(optimal number of
Y-orthogonal components (latent variables) for the optimal model), and
Q2Yhat
(total Q-square result for all Y-orthogonal components).
A demonstration case study available from a public repository of the National Cancer Institute, namely the NCI-60 data set, was used to illustrate the method's potential for omics data fusion. A subset of NCI-60 data (transcriptomics, proteomics and metabolomics) involving experimental data from 14 cancer cell lines from two tissue origins, i.e. colon and ovary, was used. The object proposed in this package contains, in a list, all the information needed to make a model: the three data blocks, a list of observation names (samples) and the binary response matrix Y.
Boccard & Rutledge
J. Boccard and D.N. Rutledge. A consensus OPLS-DA strategy for multiblock Omics data fusion. Analytica Chimica Acta, 769, 30-39, 2013.
Plot of relative contribution of each data block in the optimal model.
plotContribution(object, col = NULL, ...) ## S4 method for signature 'ConsensusOPLS' plotContribution(object, col = NULL, ...)
plotContribution(object, col = NULL, ...) ## S4 method for signature 'ConsensusOPLS' plotContribution(object, col = NULL, ...)
object |
An object of class |
col |
A vector of color codes or names, one for each block. Default, NULL, 2 to number of blocks + 1. |
... |
|
No return value, called for side effects.
Plot of DQ2 of models with permuted response.
plotDQ2( object, breaks = 10, xlab = "DQ2", main = "DQ2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotDQ2( object, breaks = 10, xlab = "DQ2", main = "DQ2 in models with permuted response", col = "blue", lty = "dashed", ... )
plotDQ2( object, breaks = 10, xlab = "DQ2", main = "DQ2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotDQ2( object, breaks = 10, xlab = "DQ2", main = "DQ2 in models with permuted response", col = "blue", lty = "dashed", ... )
object |
An object of class |
breaks |
See |
xlab |
See |
main |
See |
col |
A color code or name for DQ2 in the optimal model. Default, 2.
See |
lty |
A line type code or name for DQ2 in the optimal model. Default, 2.
See |
... |
|
No return value, called for side effects.
Plot of variable loadings in the optimal model.
plotLoadings( object, comp1 = "p_1", comp2 = "o_1", blockId = NULL, col = NULL, pch = NULL, ... ) ## S4 method for signature 'ConsensusOPLS' plotLoadings( object, comp1 = "p_1", comp2 = "o_1", blockId = NULL, col = NULL, pch = NULL, ... )
plotLoadings( object, comp1 = "p_1", comp2 = "o_1", blockId = NULL, col = NULL, pch = NULL, ... ) ## S4 method for signature 'ConsensusOPLS' plotLoadings( object, comp1 = "p_1", comp2 = "o_1", blockId = NULL, col = NULL, pch = NULL, ... )
object |
An object of class |
comp1 |
Latent variable for X-axis. Default, the first predictive
component, |
comp2 |
Latent variable for Y-axis. Default, the first orthogonal
component, |
blockId |
The positions or names of the blocks for the plot. Default, NULL, all. |
col |
A vector of color codes or names, one for each block. Default,
NULL, 2 to |
pch |
A vector of graphic symbols, one for each block. Default, NULL,
1 to |
... |
|
No return value, called for side effects.
Plot of Q2 of models with permuted response.
plotQ2( object, breaks = 10, xlab = "Q2", main = "Q2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotQ2( object, breaks = 10, xlab = "Q2", main = "Q2 in models with permuted response", col = "blue", lty = "dashed", ... )
plotQ2( object, breaks = 10, xlab = "Q2", main = "Q2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotQ2( object, breaks = 10, xlab = "Q2", main = "Q2 in models with permuted response", col = "blue", lty = "dashed", ... )
object |
An object of class |
breaks |
See |
xlab |
See |
main |
See |
col |
A color code or name for Q2 in the optimal model. Default, 2.
See |
lty |
A line type code or name for Q2 in the optimal model. Default, 2.
See |
... |
|
No return value, called for side effects.
Plot of R2 of models with permuted response.
plotR2( object, breaks = 10, xlab = "R2", main = "R2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotR2( object, breaks = 10, xlab = "R2", main = "R2 in models with permuted response", col = "blue", lty = "dashed", ... )
plotR2( object, breaks = 10, xlab = "R2", main = "R2 in models with permuted response", col = "blue", lty = "dashed", ... ) ## S4 method for signature 'ConsensusOPLS' plotR2( object, breaks = 10, xlab = "R2", main = "R2 in models with permuted response", col = "blue", lty = "dashed", ... )
object |
An object of class |
breaks |
See |
xlab |
See |
main |
See |
col |
A color code or name for R2 in the optimal model. Default, 2.
See |
lty |
A line type code or name for R2 in the optimal model. Default, 2.
See |
... |
|
No return value, called for side effects.
Plot of samples in the space of latent variables of the optimal model.
plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...) ## S4 method for signature 'ConsensusOPLS' plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)
plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...) ## S4 method for signature 'ConsensusOPLS' plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)
object |
An object of class |
comp1 |
Latent variable for abscissa. Default, the first predictive
component, |
comp2 |
Latent variable for ordinate. Default, the first orthogonal
component, |
col |
A vector of color codes or names. Default, NULL, generated
following the |
pch |
Graphic symbol. Default, 19. |
... |
|
No return value, called for side effects.
Plot of VIP versus variable loadings in the optimal model.
plotVIP( object, comp1 = "p_1", comp2 = "p", blockId = NULL, col = NULL, pch = NULL, xlab = NULL, ylab = NULL, ... ) ## S4 method for signature 'ConsensusOPLS' plotVIP( object, comp1 = "p_1", comp2 = "p", blockId = NULL, col = NULL, pch = NULL, xlab = NULL, ylab = NULL, ... )
plotVIP( object, comp1 = "p_1", comp2 = "p", blockId = NULL, col = NULL, pch = NULL, xlab = NULL, ylab = NULL, ... ) ## S4 method for signature 'ConsensusOPLS' plotVIP( object, comp1 = "p_1", comp2 = "p", blockId = NULL, col = NULL, pch = NULL, xlab = NULL, ylab = NULL, ... )
object |
An object of class |
comp1 |
Latent variable for loadings on Y-axis. Default, the first
predictive component, |
comp2 |
Latent variable for VIPs on X-axis. Default, the predictive
component, |
blockId |
The positions or names of the blocks for the plot. Default, NULL, all. |
col |
A vector of color codes or names, one for each block. Default,
NULL, 2 to |
pch |
A vector of graphic symbols, one for each block. Default, NULL,
1 to |
xlab |
X-axis label. Default, NULL, Loading on |
ylab |
Y-axis label. Default, NULL, VIP on |
... |
|
No return value, called for side effects.
Predicts the response on new data with a fitted model.
predict(object, newdata = NULL, nOcomp = NULL) ## S4 method for signature 'ConsensusOPLS' predict(object, newdata = NULL, nOcomp = NULL)
predict(object, newdata = NULL, nOcomp = NULL) ## S4 method for signature 'ConsensusOPLS' predict(object, newdata = NULL, nOcomp = NULL)
object |
An object of class |
newdata |
A list of data frames of new data to predict. If omitted, the
fitted values in |
nOcomp |
Number of Y-orthogonal components to consider. Default, NULL, the number of Y-orthogonal components in the optimal model. |