| Title: | Quality Control–based Robust LOESS Signal Correction |
|---|---|
| Description: | An R implementation of quality control–based robust LOESS(local polynomial regression fitting) signal correction for metabolomics data analysis, described in Dunn, W., Broadhurst, D., Begley, P. et al. (2011) <doi:10.1038/nprot.2011.335>. The optimisation of LOESS's span parameter using generalized cross-validation (GCV) is provided as an option. In addition to signal correction, 'qcrlscR' includes some utility functions like batch shifting and data filtering. |
| Authors: | Wanchang Lin [aut, cre], Warwick Dunn [aut] |
| Maintainer: | Wanchang Lin <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.3 |
| Built: | 2026-05-17 07:04:55 UTC |
| Source: | https://github.com/wanchanglin/qcrlscr |
Remove batch effect withing each block.
batch.shift(x, y, method = "mean", overall_average = TRUE)batch.shift(x, y, method = "mean", overall_average = TRUE)
x |
a data matrix. |
y |
a categorical data for batch/block information. |
method |
method for shifting. |
overall_average |
a logical value to indicate whether or not an overall average will be added after shifting. |
a shifted data matrix.
Silvia Wagner, et.al, Tools in Metabonomics: An Integrated Validation Approach for LC-MS Metabolic Profiling of Mercapturic Acids in Human Urine Anal. Chem., 2007, 79 (7), pp 2918-2926, DOI: 10.1021/ac062153w
names(man_qc) data <- man_qc$data meta <- man_qc$meta ## batch shifting cls.bl <- factor(meta$batch) res <- batch.shift(data, cls.bl, overall_average = TRUE)names(man_qc) data <- man_qc$data meta <- man_qc$meta ## batch shifting cls.bl <- factor(meta$batch) res <- batch.shift(data, cls.bl, overall_average = TRUE)
This HPLC data set includes 4 batches with missing values.
man_qcman_qc
A list with data matrix and meta data:
A data frame with 462 replicates (row) and 656 features (column)
A data frame with 2 columns:
batch: 4 batches
sample_type: QC and Sample
man_qc t(sapply(man_qc, dim)) ## Select data matrix and meta data data <- man_qc$data meta <- man_qc$meta ## Select batches and data types cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch)man_qc t(sapply(man_qc, dim)) ## Select data matrix and meta data data <- man_qc$data meta <- man_qc$meta ## Select batches and data types cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch)
This function calculates the percentage of missing values and keeps those features with missing values percentage less than the designed threshold.
mv.filter(x, thres = 0.3)mv.filter(x, thres = 0.3)
x |
a data matrix. The columns are features. |
thres |
threshold of missing values. Features less than this threshold will be kept. Value has to be between 0 and 1. |
a list of with contents:
dat the filtered data matrix
idx a logical vector of index for keeping features.
Other missing value processing:
mv.filter.qc(),
mv.perc()
names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20) ## missing values filtering tmp <- mv.filter(data, thres = 0.15) data_f <- tmp$dat ## compare dim(data_f) dim(data)names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20) ## missing values filtering tmp <- mv.filter(data, thres = 0.15) data_f <- tmp$dat ## compare dim(data_f) dim(data)
Data filtering based on "qc" missing values
mv.filter.qc(x, y, thres = 0.3)mv.filter.qc(x, y, thres = 0.3)
x |
a data matrix. |
y |
a character string with contents of "sample", "qc" and "blank". |
thres |
threshold of missing values. Features less than this threshold will be kept. |
a list of with contents:
dat the filtered data matrix
idx a logical vector of index for keeping features.
Other missing value processing:
mv.filter(),
mv.perc()
names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20) ## missing values filtering based on QC cls.qc <- factor(meta$sample_type) tmp <- mv.filter.qc(data, cls.qc, thres = 0.15) data_f <- tmp$dat ## compare dim(data_f) dim(data)names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20) ## missing values filtering based on QC cls.qc <- factor(meta$sample_type) tmp <- mv.filter.qc(data, cls.qc, thres = 0.15) data_f <- tmp$dat ## compare dim(data_f) dim(data)
Calculate missing value percentage.
mv.perc(x)mv.perc(x)
x |
an vector, matrix or data frame. |
missing value percentage.
Other missing value processing:
mv.filter(),
mv.filter.qc()
names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20)names(man_qc) data <- man_qc$data meta <- man_qc$meta ## check missing value rates tail(sort(mv.perc(data)), 20)
Perform outlier detection using univariate method.
outl.det.u(x, method = c("percentile", "median"))outl.det.u(x, method = c("percentile", "median"))
x |
a numeric vector. |
method |
method for univariate outlier detection. Only |
median: the absolute difference between the observation and the sample
median is larger than 2 times of the Median Absolute Deviation divided
by 0.6745.
percentile: either smaller than the 1st quartile minus 1.5 times of
IQR, or larger than the 3rd quartile plus 1.5 times of IQR.
a logical vector.
Wilcox R R, Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy, Springer 2010 (2nd edition), pages 31-35.
x <- c(2, 3, 4, 5, 6, 7, NA, 9, 50, 50) outl.det.u(x, "percentile")x <- c(2, 3, 4, 5, 6, 7, NA, 9, 50, 50) outl.det.u(x, "percentile")
QC based robust LOESS (locally estimated scatterplot smoothing) signal correction (QC-RLSC)
qc.rlsc(x, y, method = c("subtract", "divide"), opti = TRUE, ...)qc.rlsc(x, y, method = c("subtract", "divide"), opti = TRUE, ...)
x |
A data frame with samples (row) and variables (column). |
y |
A vector with string of "qc" and "sample". |
method |
Data scaling method. |
opti |
A logical value indicating whether or not optimise 'span' |
... |
Other parameter for 'loess'. |
This function includes only information of sample types (QC or
Sample) for signal correction. It does not require batch information.
User may use batch elimination routine such as batch.shift() in this
package or others to remove batch effects after signal correction.
If data matrix has missing values, user should filter the data based on missing values percentage. No missing values imputation is needed.
An option is also provided to optimise LOESS's span in a range
between 0.05 to 0.95. The R codes are modified from
https://bit.ly/3zBo3Qn.
A corrected data frame.
Dunn et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols 6, 1060–1083 (2011)
Other QC-RLSC function:
qc.rlsc.wrap()
names(man_qc) data <- man_qc$data meta <- man_qc$meta cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch) ## apply QC-RLSC with optimisation of 'span' res_1 <- qc.rlsc(data, cls.qc, method = "subtract", opti = TRUE) ## apply QC-RLSC without optimisation of 'span' res_2 <- qc.rlsc(data, cls.qc, method = "subtract", opti = FALSE)names(man_qc) data <- man_qc$data meta <- man_qc$meta cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch) ## apply QC-RLSC with optimisation of 'span' res_1 <- qc.rlsc(data, cls.qc, method = "subtract", opti = TRUE) ## apply QC-RLSC without optimisation of 'span' res_2 <- qc.rlsc(data, cls.qc, method = "subtract", opti = FALSE)
Wrapper function for QC-RLSC
qc.rlsc.wrap( dat, cls.qc, cls.bl, method = c("subtract", "divide"), intra = FALSE, opti = TRUE, log10 = TRUE, outl = TRUE, shift = TRUE, ... )qc.rlsc.wrap( dat, cls.qc, cls.bl, method = c("subtract", "divide"), intra = FALSE, opti = TRUE, log10 = TRUE, outl = TRUE, shift = TRUE, ... )
dat |
A data frame with samples (row) and variables (column). |
cls.qc |
A vector with string of "qc" and "sample". |
cls.bl |
A vector with string of batch indicators. |
method |
Data scaling method. Support "subtract" and "divide" |
intra |
A logical value indicating whether signal correction is performed inside each batch ("intra-batch") or not ("inter-batch"). |
opti |
A logical value indicating whether or not 'span' parameters are optimised. |
log10 |
A logical value indicating whether log10 transformation for the data set or not. If the transformation is applied, the reverse procedure will be performed. |
outl |
A logical value indicating whether or not QC outlier detection is employed. If TRUE, the QC outlier will be assigned as the median of QC. |
shift |
A logical value indicating whether or not batch shift is applied after signal correction. |
... |
Other parameter for 'loess'. |
A corrected data frame.
Other QC-RLSC function:
qc.rlsc()
names(man_qc) data <- man_qc$data meta <- man_qc$meta cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch) ## apply QC-RLSC wrapper function method <- "divide" # "subtract" intra <- TRUE opti <- TRUE log10 <- TRUE outl <- TRUE shift <- TRUE res <- qc.rlsc.wrap(data, cls.qc, cls.bl, method, intra, opti, log10, outl, shift)names(man_qc) data <- man_qc$data meta <- man_qc$meta cls.qc <- factor(meta$sample_type) cls.bl <- factor(meta$batch) ## apply QC-RLSC wrapper function method <- "divide" # "subtract" intra <- TRUE opti <- TRUE log10 <- TRUE outl <- TRUE shift <- TRUE res <- qc.rlsc.wrap(data, cls.qc, cls.bl, method, intra, opti, log10, outl, shift)