Title: | Adaptive Generalized PCA |
---|---|
Description: | Implements adaptive gPCA, as described in: Fukuyama, J. (2017) <arXiv:1702.00501>. The package also includes functionality for applying the method to 'phyloseq' objects so that the method can be easily applied to microbiome data and a 'shiny' app for interactive visualization. |
Authors: | Julia Fukuyama [aut, cre] |
Maintainer: | Julia Fukuyama <[email protected]> |
License: | AGPL-3 |
Version: | 0.1.3 |
Built: | 2025-03-04 06:20:32 UTC |
Source: | https://github.com/jfukuyama/adaptivegpca |
This package implements the methods for structured dimensionality reduction described in Fukuyama, J. (2017). The general idea is to obtain a low-dimensional representation of the data, similar to that given by PCA, which incorporates side information about the relationships between the variables. The output is similar to a PCA biplot, but the variable loadings are regularized so that similar variables are encouraged to have similar loadings on the principal axes.
There are two main ways of using this package. The function
adaptivegpca
will choose how much to regularize the
variables according to the similarities between them, while the
function gpcaFullFamily
produces analogous output for
a range of regularization parameters. With this function, the
results for the different regularization parameters are inspected
with the visualizeFullFamily
function, and the
desired parameter is chosen manually.
The package also contains functionality to integrate with phyloseq:
the function processPhyloseq
takes a
phyloseq
object and creates the inputs
necessary to perform adaptive gPCA on a microbiome dataset
including information about the phylogenetic relationships between
the bacteria.
Performs adaptive generalized PCA, a dimensionality-reduction method which takes into account similarities between the variables. See Fukuyama, J. (2017) for more details.
adaptivegpca(X, Q, k = 2, weights = rep(1, nrow(X)))
adaptivegpca(X, Q, k = 2, weights = rep(1, nrow(X)))
X |
A |
Q |
A |
k |
The number of components to return. |
weights |
A vector of length |
A list containing the row/sample scores (U
), the
variable loadings (QV
), the proportion of variance explained
by each of the principal components (vars
), the value of
that was used (
r
).
data(AntibioticSmall) out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
data(AntibioticSmall) out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
A phyloseq object describing a time course experiment in which three people two courses of cipro and had their gut microbiomes sampled. See Dethlefsen and Relman, PNAS (2010), at https://www.ncbi.nlm.nih.gov/pubmed/20847294 for more details.
A phyloseq object.
This is a smaller version of the AntibioticPhyloseq
dataset,
for use in the examples so that the running time isn't so long. It
has the same samples and a randomly selected set of 200 of the
taxa. It is stored as a list with three components: the normalized
OTU abundances (X
), the similarity matrix for the taxa
(Q
), and the diagonal weight matrix (D
, the identity
matrix).
A list with three components.
Estimates the values of and
in a model
.
estimateComponents(X, Q, Qeig = NULL)
estimateComponents(X, Q, Qeig = NULL)
X |
An |
Q |
A |
Qeig |
If the eigendecomposition of |
A list with and
.
data(AntibioticSmall) estimateComponents(AntibioticSmall$X, AntibioticSmall$Q)
data(AntibioticSmall) estimateComponents(AntibioticSmall$X, AntibioticSmall$Q)
Performs standard gPCA with k
components on a data matrix X
with
row inner product Q
and weights D
.
gpca(X, Q, D = rep(1, nrow(X)), k)
gpca(X, Q, D = rep(1, nrow(X)), k)
X |
A data matrix of size |
Q |
An inner product matrix for the rows, either as a |
D |
Sample weights, a vector of length |
k |
The number of components to return. |
A list with variable loadings on the principal axes
(QV
), sample/row scores (U
), the fraction of the
variance explained by each of the axes (vars
).
data(AntibioticSmall) out.gpca = gpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
data(AntibioticSmall) out.gpca = gpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
Creates a sequence of gPCA data representations. One end of the
sequence () doesn't do any regularization according to
the variable structure (and so is just standard PCA), and the other
(
) does a maximal amount of regularization according to
the variable structure.
gpcaFullFamily(X, Q, weights = rep(1, nrow(X)), k = 2, rvec = (0:100)/100, findReflections = TRUE, returnLong = FALSE, sampledata = NULL, variabledata = NULL)
gpcaFullFamily(X, Q, weights = rep(1, nrow(X)), k = 2, rvec = (0:100)/100, findReflections = TRUE, returnLong = FALSE, sampledata = NULL, variabledata = NULL)
X |
A data matrix of size |
Q |
A |
weights |
A vector of weights for the rows of |
k |
The number of components to compute for each ordination. |
rvec |
The values of |
findReflections |
Whether or not flip the axes so as to make
neighboring ordinations as close as possible. If |
returnLong |
Return a long data frame with the samples/variables instead of a list of data frames. |
sampledata |
Extra sample data to be included along with the sample scores. |
variabledata |
Extra variable data to be included along with the variable loadings. |
A list containing elements for the sample points
(locationList
), the species points (speciesList
), and
the variance fractions (varsList
). Each element is itself a
list of data frames (location/species points) or of vectors (for
the variances).
data(AntibioticSmall) out.ff = gpcaFullFamily(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
data(AntibioticSmall) out.ff = gpcaFullFamily(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
Shiny gadget that allows users to visualize the scores of the taxa on the agpca axes, their positions on the phylogenetic tree, and their taxonomic assignments.
inspectTaxonomy(agpcafit, physeq, axes = c(1, 2), br.length = FALSE, height = 600)
inspectTaxonomy(agpcafit, physeq, axes = c(1, 2), br.length = FALSE, height = 600)
agpcafit |
An agpca object, created either by the function
|
physeq |
A phyloseq object with a tree and a taxonomy table. |
axes |
The axes to plot, must be a vector of two whole numbers. |
br.length |
Plot the tree with the branch lengths? |
height |
The height, in pixels, of the plotting region. |
The function will open a browser window showing the tree and the locations of the taxa on the selected agpca axes. "Brushing" over the plot will highlight the positions of the selected taxa on the tree and list their taxonomic assignments. Clicking the "done" button will exit the app and return a data frame containing the positions of the selected taxa on the agpca axes, the taxonomic assignments of the selected taxa, and their names.
## Not run: data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq) out.agpca = adaptivegpca(pp$X, pp$Q, k = 2) treeInspect(out.agpca, AntibioticPhyloseq) ## End(Not run)
## Not run: data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq) out.agpca = adaptivegpca(pp$X, pp$Q, k = 2) treeInspect(out.agpca, AntibioticPhyloseq) ## End(Not run)
Plots the output from adaptivegpca
, either a scree
plot, the samples, or the variables.
## S3 method for class 'adaptivegpca' plot(x, type = c("scree", "samples", "variables"), axes = c(1, 2), ...)
## S3 method for class 'adaptivegpca' plot(x, type = c("scree", "samples", "variables"), axes = c(1, 2), ...)
x |
An object of class |
type |
What type of plot to make. |
axes |
Which axes to plot. |
... |
Not used. |
data(AntibioticSmall) out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2) plot(out.agpca) plot(out.agpca, type = "samples") plot(out.agpca, type = "variables")
data(AntibioticSmall) out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2) plot(out.agpca) plot(out.agpca, type = "samples") plot(out.agpca, type = "variables")
Print an adaptivegpca object
## S3 method for class 'adaptivegpca' print(x, ...)
## S3 method for class 'adaptivegpca' print(x, ...)
x |
|
... |
Not used. |
Takes a phyloseq object and creates the matrices necessary to do adaptive gPCA.
processPhyloseq(physeq, ca = FALSE)
processPhyloseq(physeq, ca = FALSE)
physeq |
A |
ca |
If TRUE, do the normalization as for correspondence analysis (transform counts to relative abundances, compute sample weights, center the relative abundances according to the sample weights). Otherwise, simply center the data. |
A list of the matrix to perform adaptive gPCA on
(X
), the species similarity matrix (Q
), and the
sample weights (weights
).
data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq)
data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq)
Project the sample points stored in the rows of X
along the
eigenvectors of Q
and find the variance along each of the
projections.
varianceOnEvecs(X, Q)
varianceOnEvecs(X, Q)
X |
An |
Q |
A |
A vector containing the variance of the samples along each
of the eigenvectors of Q
.
data(AntibioticSmall) voe = varianceOnEvecs(AntibioticSmall$X, AntibioticSmall$Q)
data(AntibioticSmall) voe = varianceOnEvecs(AntibioticSmall$X, AntibioticSmall$Q)
Shiny gadget that shows the ordinations from an entire family of gPCAs and returns a gPCA object with the one selected by the user.
visualizeFullFamily(fullFamily, sample_data = NULL, sample_mapping = aes_string(x = "Axis1", y = "Axis2"), sample_facet = NULL, var_data = NULL, var_mapping = aes_string(x = "Axis1", y = "Axis2"), layout = c(2, 6))
visualizeFullFamily(fullFamily, sample_data = NULL, sample_mapping = aes_string(x = "Axis1", y = "Axis2"), sample_facet = NULL, var_data = NULL, var_mapping = aes_string(x = "Axis1", y = "Axis2"), layout = c(2, 6))
fullFamily |
The output from |
sample_data |
Optional data used for plotting the samples |
sample_mapping |
An aesthetic mapping to be passed to
|
sample_facet |
A |
var_data |
Optional data used for plotting the variables |
var_mapping |
An aesthetic mapping to be passed to
|
layout |
A vector of length 2. The first number gives the number of columns (out of 12) for the sidebar, the second number gives the number of columns (out of 12) for the sample plot in the main panel. |
This function will open a 'shiny' app in a browser
window. You can investigate the results for different values of
with this app. Once you press the 'done' button, the app
will close and the function will return an R object containing the
results for the value of
(the regularization parameter)
that was chosen in the app. The returned object is a list
containing the variable loadings on the principal axes (
QV
),
the sample/row scores (U
), and the fraction of the variance
explained by each of the axes (vars
).
## Not run: data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq) out.ff = gpcaFullFamily(pp$X, Q = pp$Q, D = pp$D, k = 2) out.agpca = visualizeFullFamily(out.ff, sample_data = sample_data(AntibioticPhyloseq), sample_mapping = aes(x = Axis1, y = Axis2, color = condition), var_data = tax_table(AntibioticPhyloseq), var_mapping = aes(x = Axis1, y = Axis2, color = Phylum)) ## End(Not run)
## Not run: data(AntibioticPhyloseq) pp = processPhyloseq(AntibioticPhyloseq) out.ff = gpcaFullFamily(pp$X, Q = pp$Q, D = pp$D, k = 2) out.agpca = visualizeFullFamily(out.ff, sample_data = sample_data(AntibioticPhyloseq), sample_mapping = aes(x = Axis1, y = Axis2, color = condition), var_data = tax_table(AntibioticPhyloseq), var_mapping = aes(x = Axis1, y = Axis2, color = Phylum)) ## End(Not run)