Package 'ConsensusOPLS' reference manual

Package 'ConsensusOPLS'

Title:	Consensus OPLS for Multi-Block Data Fusion
Description:	Merging data from multiple sources is a relevant approach for comprehensively evaluating complex systems. However, the inherent problems encountered when analyzing single tables are amplified with the generation of multi-block datasets, and finding the relationships between data layers of increasing complexity constitutes a challenging task. For that purpose, a generic methodology is proposed by combining the strength of established data analysis strategies, i.e. multi-block approaches and the Orthogonal Partial Least Squares (OPLS) framework to provide an efficient tool for the fusion of data obtained from multiple sources. The package enables quick and efficient implementation of the consensus OPLS model for any horizontal multi-block data structures (observation-based matching). Moreover, it offers an interesting range of metrics and graphics to help to determine the optimal number of components and check the validity of the model through permutation tests. Interpretation tools include score and loading plots, Variable Importance in Projection (VIP), functionality predict for SHAP computing, and performance coefficients such as R2, Q2, and DQ2 coefficients. J. Boccard and D.N. Rutledge (2013) <doi:10.1016/j.aca.2013.01.022>.
Authors:	Celine Bougel [aut] , Julien Boccard [aut] , Florence Mehl [aut] , Marie Tremblay-Franco [fnd] , Mark Ibberson [fnd] , Van Du T. Tran [aut, cre]
Maintainer:	Van Du T. Tran <[email protected]>
License:	GPL (>= 3)
Version:	1.1.0
Built:	2025-03-29 07:04:46 UTC
Source:	https://github.com/cran/ConsensusOPLS

Title:

Consensus OPLS for Multi-Block Data Fusion

Description:

Merging data from multiple sources is a relevant approach for comprehensively evaluating complex systems. However, the inherent problems encountered when analyzing single tables are amplified with the generation of multi-block datasets, and finding the relationships between data layers of increasing complexity constitutes a challenging task. For that purpose, a generic methodology is proposed by combining the strength of established data analysis strategies, i.e. multi-block approaches and the Orthogonal Partial Least Squares (OPLS) framework to provide an efficient tool for the fusion of data obtained from multiple sources. The package enables quick and efficient implementation of the consensus OPLS model for any horizontal multi-block data structures (observation-based matching). Moreover, it offers an interesting range of metrics and graphics to help to determine the optimal number of components and check the validity of the model through permutation tests. Interpretation tools include score and loading plots, Variable Importance in Projection (VIP), functionality predict for SHAP computing, and performance coefficients such as R2, Q2, and DQ2 coefficients. J. Boccard and D.N. Rutledge (2013) <doi:10.1016/j.aca.2013.01.022>.

Authors:

Celine Bougel [aut]

, Julien Boccard [aut]

, Florence Mehl [aut]

, Marie Tremblay-Franco [fnd]

, Mark Ibberson [fnd]

, Van Du T. Tran [aut, cre]

Maintainer:

Van Du T. Tran <[email protected]>

License:

GPL (>= 3)

Version:

1.1.0

Built:

2025-03-29 07:04:46 UTC

Source:

https://github.com/cran/ConsensusOPLS

Help Index

Consensus OPLS for Multi-Block Data Fusion

Description

Merging data from multiple sources is a relevant approach for comprehensively evaluating complex systems. However, the inherent problems encountered when analyzing single tables are amplified with the generation of multi-block datasets, and finding the relationships between data layers of increasing complexity constitutes a challenging task. For that purpose, a generic methodology is proposed by combining the strengths of established data analysis strategies, i.e. multi-block approaches and the OPLS framework to provide an efficient tool for the fusion of data obtained from multiple sources. The package enables quick and efficient implementation of the consensus OPLS model for any horizontal multi-block data structure (observation-based matching). Moreover, it offers an interesting range of metrics and graphics to help to determine the optimal number of components and check the validity of the model through permutation tests. Interpretation tools include scores and loadings plots, as well as Variable Importance in Projection (VIP), and performance coefficients such as R2, Q2 and DQ2 coefficients.

This package uses functions from the K-OPLS package, developed by Max Bylesjo, University of Umea, Judy Fonville and Mattias Rantalainen, Imperial College.

This code has been extended and adapted under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

Author(s)

Maintainer: Van Du T. Tran [email protected] (ORCID)

Authors:

Celine Bougel [email protected] (ORCID)
Julien Boccard [email protected] (ORCID)
Florence Mehl [email protected] (ORCID)

Other contributors:

Marie Tremblay-Franco [email protected] (ORCID) [funder]
Mark Ibberson [email protected] (ORCID) [funder]

ConsensusOPLS

Description

Constructs the consensus OPLS model with the optimal number of orthogonal components for given data blocks and response, and evaluate the model quality w.r.t other models built with randomly permuted responses.

Usage

ConsensusOPLS(
  data,
  Y,
  maxPcomp = 1,
  maxOcomp = 5,
  modelType = "da",
  nperm = 100,
  cvType = "nfold",
  nfold = 5,
  nMC = 100,
  cvFrac = 4/5,
  kernelParams = list(type = "p", params = c(order = 1)),
  mc.cores = 1,
  verbose = FALSE
)
ConsensusOPLS(
  data,
  Y,
  maxPcomp = 1,
  maxOcomp = 5,
  modelType = "da",
  nperm = 100,
  cvType = "nfold",
  nfold = 5,
  nMC = 100,
  cvFrac = 4/5,
  kernelParams = list(type = "p", params = c(order = 1)),
  mc.cores = 1,
  verbose = FALSE
)

Arguments

`data`	A list of data blocks. Each element of the list must be of matrix type. Rows and columns can be identified (names), in which case this will be retained during analysis. Any pre-processing of the data (e.g. scaling) must be carried out before building the model.
`Y`	A vector, factor, dummy matrix or numeric matrix for the response. The type of answer given will condition the model to be used: a numeric vector for linear regression, a factor or dummy matrix for logistic regression or a discriminant model.
`maxPcomp`	Maximum number of Y-predictive components used to build the optimal model. Default, 1.
`maxOcomp`	Maximum number of Y-orthogonal components used to build the optimal model. Default, 5.
`modelType`	String for type of OPLS regression model, either `reg` for regression or `da` for discriminant analysis. Default, `da`.
`nperm`	Number of random permutations desired in response Y. Default, 100.
`cvType`	String for type of cross-validation used. Either `nfold` for n-fold cross-validation, where `nfold` is look up, or `mccv` for Monte Carlo cross-validation, or `mccvb` for Monte Carlo class-balanced cross-validation, where `nMC` and `cvFrac` are used. Default, `nfold`, i.e. `nMC` and `cvFrac` are ignored.
`nfold`	Number of folds performed in n-fold cross-validation. This can be set to the number of samples to perform Leave-One-Out cross validation. Default, 5.
`nMC`	An integer indicating the number of rounds performed when `cvType` is `mccv` or `mccvb`. Default, 100.
`cvFrac`	A numeric value indicating the fraction of observations from `data` used in the training set for `mccv` or `mccvb` cross-validation. Default, 4/5 = 0.8.
`kernelParams`	List of parameters for the kernel. Either `p` for polynomial kernel, which implies specifying the order of the polynomial by the `order` parameter, or `g` for Gaussian kernel. Default, `list(type='p', params = c(order=1.0))`.
`mc.cores`	Number of cores for parallel computing. Default, 1.
`verbose`	A logical indicating if detailed information (cross validation) will be shown. Default, FALSE.

Value

An object of class ConsensusOPLS representing the consensus OPLS model fit.

Examples

data(demo_3_Omics)
datablocks <- lapply(
    demo_3_Omics[c("MetaboData", "MicroData", "ProteoData")], scale)
res <- ConsensusOPLS(data=datablocks, 
                     Y=demo_3_Omics$Y,
                     maxPcomp=1, maxOcomp=2, 
                     modelType='da',
                     nperm=5)
res
data(demo_3_Omics)
datablocks <- lapply(
    demo_3_Omics[c("MetaboData", "MicroData", "ProteoData")], scale)
res <- ConsensusOPLS(data=datablocks, 
                     Y=demo_3_Omics$Y,
                     maxPcomp=1, maxOcomp=2, 
                     modelType='da',
                     nperm=5)
res

`ConsensusOPLS` S4 class

Description

An object returned by the ConsensusOPLS function, of class ConsensusOPLS, and representing a fitted Consensus OPLS model.

Slots

modelType: The type of requested OPLS regression model.
response: The provided response variable (Y).
nPcomp: Number of Y-predictive components (latent variables) of the optimal model.
nOcomp: Number of Y-orthogonal components (latent variables) of the optimal model.
blockContribution: Relative contribution of each block (normalized lambda values) to the latent variables.
scores: Representation of the samples in the latent variables of the optimal model.
loadings: Contribution of each block's variables to the latent variables of the optimal model.
VIP: Variable importance in projection (VIP) for each block of data, assessing the relevance of the variables in explaining the variation in the response.
R2X: Proportion of variation in data blocks explained by the optimal model.
R2Y: Proportion of variation in the response explained by the optimal model.
Q2: Predictive ability of the optimal model.
DQ2: Predictive ability of the optimal discriminant model.
permStats: Assessment of models with permuted response.
model: The optimal model.
cv: Cross-validation result towards the optimal model. Contains AllYhat (all predicted Y values as a concatenated matrix), cvTestIndex (indexes for the test set observations during the cross-validation rounds), DQ2Yhat (total discriminant Q-square result for all Y-orthogonal components), nOcompOpt (optimal number of Y-orthogonal components (latent variables) for the optimal model), and Q2Yhat (total Q-square result for all Y-orthogonal components).

Three-block omics data

Description

A demonstration case study available from a public repository of the National Cancer Institute, namely the NCI-60 data set, was used to illustrate the method's potential for omics data fusion. A subset of NCI-60 data (transcriptomics, proteomics and metabolomics) involving experimental data from 14 cancer cell lines from two tissue origins, i.e. colon and ovary, was used. The object proposed in this package contains, in a list, all the information needed to make a model: the three data blocks, a list of observation names (samples) and the binary response matrix Y.

Author(s)

Boccard & Rutledge

References

J. Boccard and D.N. Rutledge. A consensus OPLS-DA strategy for multiblock Omics data fusion. Analytica Chimica Acta, 769, 30-39, 2013.

Block contribution plot

Description

Plot of relative contribution of each data block in the optimal model.

Usage

plotContribution(object, col = NULL, ...)

## S4 method for signature 'ConsensusOPLS'
plotContribution(object, col = NULL, ...)
plotContribution(object, col = NULL, ...)

## S4 method for signature 'ConsensusOPLS'
plotContribution(object, col = NULL, ...)

Arguments

`object`	An object of class `ConsensusOPLS`.
`col`	A vector of color codes or names, one for each block. Default, NULL, 2 to number of blocks + 1.
`...`	`barplot` arguments.

Value

No return value, called for side effects.

DQ2 plot

Description

Plot of DQ2 of models with permuted response.

Usage

plotDQ2(
  object,
  breaks = 10,
  xlab = "DQ2",
  main = "DQ2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotDQ2(
  object,
  breaks = 10,
  xlab = "DQ2",
  main = "DQ2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)
plotDQ2(
  object,
  breaks = 10,
  xlab = "DQ2",
  main = "DQ2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotDQ2(
  object,
  breaks = 10,
  xlab = "DQ2",
  main = "DQ2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

Arguments

`object`	An object of class `ConsensusOPLS`.
`breaks`	See `hist`.
`xlab`	See `hist`.
`main`	See `hist`.
`col`	A color code or name for DQ2 in the optimal model. Default, 2. See `abline`.
`lty`	A line type code or name for DQ2 in the optimal model. Default, 2. See `abline`.
`...`	`hist` arguments.

Value

No return value, called for side effects.

Loading plot

Description

Plot of variable loadings in the optimal model.

Usage

plotLoadings(
  object,
  comp1 = "p_1",
  comp2 = "o_1",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotLoadings(
  object,
  comp1 = "p_1",
  comp2 = "o_1",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  ...
)
plotLoadings(
  object,
  comp1 = "p_1",
  comp2 = "o_1",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotLoadings(
  object,
  comp1 = "p_1",
  comp2 = "o_1",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  ...
)

Arguments

`object`	An object of class `ConsensusOPLS`.
`comp1`	Latent variable for X-axis. Default, the first predictive component, `p_1`.
`comp2`	Latent variable for Y-axis. Default, the first orthogonal component, `o_1`.
`blockId`	The positions or names of the blocks for the plot. Default, NULL, all.
`col`	A vector of color codes or names, one for each block. Default, NULL, 2 to `length(blockId)+1`.
`pch`	A vector of graphic symbols, one for each block. Default, NULL, 1 to `length(blockId)`.
`...`	`plot` arguments.

Value

No return value, called for side effects.

Q2 plot

Description

Plot of Q2 of models with permuted response.

Usage

plotQ2(
  object,
  breaks = 10,
  xlab = "Q2",
  main = "Q2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotQ2(
  object,
  breaks = 10,
  xlab = "Q2",
  main = "Q2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)
plotQ2(
  object,
  breaks = 10,
  xlab = "Q2",
  main = "Q2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotQ2(
  object,
  breaks = 10,
  xlab = "Q2",
  main = "Q2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

Arguments

`object`	An object of class `ConsensusOPLS`.
`breaks`	See `hist`.
`xlab`	See `hist`.
`main`	See `hist`.
`col`	A color code or name for Q2 in the optimal model. Default, 2. See `abline`.
`lty`	A line type code or name for Q2 in the optimal model. Default, 2. See `abline`.
`...`	`hist` arguments.

Value

No return value, called for side effects.

R2 plot

Description

Plot of R2 of models with permuted response.

Usage

plotR2(
  object,
  breaks = 10,
  xlab = "R2",
  main = "R2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotR2(
  object,
  breaks = 10,
  xlab = "R2",
  main = "R2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)
plotR2(
  object,
  breaks = 10,
  xlab = "R2",
  main = "R2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotR2(
  object,
  breaks = 10,
  xlab = "R2",
  main = "R2 in models with permuted response",
  col = "blue",
  lty = "dashed",
  ...
)

Arguments

`object`	An object of class `ConsensusOPLS`.
`breaks`	See `hist`.
`xlab`	See `hist`.
`main`	See `hist`.
`col`	A color code or name for R2 in the optimal model. Default, 2. See `abline`.
`lty`	A line type code or name for R2 in the optimal model. Default, 2. See `abline`.
`...`	`hist` arguments.

Value

No return value, called for side effects.

Score plot

Description

Plot of samples in the space of latent variables of the optimal model.

Usage

plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)

## S4 method for signature 'ConsensusOPLS'
plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)
plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)

## S4 method for signature 'ConsensusOPLS'
plotScores(object, comp1 = "p_1", comp2 = "o_1", col = NULL, pch = 19, ...)

Arguments

`object`	An object of class `ConsensusOPLS`.
`comp1`	Latent variable for abscissa. Default, the first predictive component, `p_1`.
`comp2`	Latent variable for ordinate. Default, the first orthogonal component, `o_1`.
`col`	A vector of color codes or names. Default, NULL, generated following the `response`.
`pch`	Graphic symbol. Default, 19.
`...`	`plot` arguments.

Value

No return value, called for side effects.

VIP plot

Description

Plot of VIP versus variable loadings in the optimal model.

Usage

plotVIP(
  object,
  comp1 = "p_1",
  comp2 = "p",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  xlab = NULL,
  ylab = NULL,
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotVIP(
  object,
  comp1 = "p_1",
  comp2 = "p",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  xlab = NULL,
  ylab = NULL,
  ...
)
plotVIP(
  object,
  comp1 = "p_1",
  comp2 = "p",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  xlab = NULL,
  ylab = NULL,
  ...
)

## S4 method for signature 'ConsensusOPLS'
plotVIP(
  object,
  comp1 = "p_1",
  comp2 = "p",
  blockId = NULL,
  col = NULL,
  pch = NULL,
  xlab = NULL,
  ylab = NULL,
  ...
)

Arguments

`object`	An object of class `ConsensusOPLS`.
`comp1`	Latent variable for loadings on Y-axis. Default, the first predictive component, `p_1`.
`comp2`	Latent variable for VIPs on X-axis. Default, the predictive component, `p`.
`blockId`	The positions or names of the blocks for the plot. Default, NULL, all.
`col`	A vector of color codes or names, one for each block. Default, NULL, 2 to `length(blockId)+1`.
`pch`	A vector of graphic symbols, one for each block. Default, NULL, 1 to `length(blockId)`.
`xlab`	X-axis label. Default, NULL, Loading on `comp`.
`ylab`	Y-axis label. Default, NULL, VIP on `comp`.
`...`	`plot` arguments.

Value

No return value, called for side effects.

Model prediction

Description

Predicts the response on new data with a fitted model.

Usage

predict(object, newdata = NULL, nOcomp = NULL)

## S4 method for signature 'ConsensusOPLS'
predict(object, newdata = NULL, nOcomp = NULL)
predict(object, newdata = NULL, nOcomp = NULL)

## S4 method for signature 'ConsensusOPLS'
predict(object, newdata = NULL, nOcomp = NULL)

Arguments

`object`	An object of class `ConsensusOPLS`.
`newdata`	A list of data frames of new data to predict. If omitted, the fitted values in `object` are returned.
`nOcomp`	Number of Y-orthogonal components to consider. Default, NULL, the number of Y-orthogonal components in the optimal model.

Package 'ConsensusOPLS'

Help Index

Consensus OPLS for Multi-Block Data Fusion

Description

Author(s)

ConsensusOPLS

Description

Usage

Arguments

Value

Examples

ConsensusOPLS S4 class

Description

Slots

Three-block omics data

Description

Author(s)

References

Block contribution plot

Description

Usage

Arguments

Value

DQ2 plot

Description

Usage

Arguments

Value

Loading plot

Description

Usage

Arguments

Value

Q2 plot

Description

Usage

Arguments

Value

R2 plot

Description

Usage

Arguments

Value

Score plot

Description

Usage

Arguments

Value

VIP plot

Description

Usage

Arguments

Value

Model prediction

Description

Usage

Arguments

`ConsensusOPLS` S4 class