Predict feature vectors from enhanced PCs.

enhanceFeatures(
  sce.enhanced,
  sce.ref,
  feature_names = NULL,
  model = c("xgboost", "dirichlet", "lm"),
  use.dimred = "PCA",
  assay.type = "logcounts",
  altExp.type = NULL,
  feature.matrix = NULL,
  nrounds = 0,
  train.n = round(ncol(sce.ref) * 2/3)
)

Arguments

sce.enhanced

SingleCellExperiment object with enhanced PCs.

sce.ref

SingleCellExperiment object with original PCs and expression.

feature_names

List of genes/features to predict expression/values for.

model

Model used to predict enhanced values.

use.dimred

Name of dimension reduction to use.

assay.type

Expression matrix in assays(sce.ref) to predict.

altExp.type

Expression matrix in altExps(sce.ref) to predict. Overrides assay.type if specified.

feature.matrix

Expression/feature matrix to predict, if not directly attached to sce.ref. Must have columns corresponding to the spots in sce.ref. Overrides assay.type and altExp.type if specified.

nrounds

Nonnegative integer to set the nrounds parameter (max number of boosting iterations) for xgboost. nrounds = 100 works reasonably well in most cases. If nrounds is set to 0, the parameter will be tuned using a train-test split. We recommend tuning nrounds for improved feature prediction, but note this will increase runtime.

train.n

Number of spots to use in the training dataset for tuning nrounds. By default, 2/3 the total number of spots are used.

Value

If assay.type or altExp.type are specified, the enhanced features are stored in the corresponding slot of sce.enhanced and the modified SingleCellExperiment object is returned.

If feature.matrix is specified, or if a subset of features are requested, the enhanced features are returned directly as a matrix.

Details

Enhanced features are computed by fitting a predictive model to a low-dimensional representation of the original expression vectors. By default, a linear model is fit for each gene using the top 15 principal components from each spot, i.e. lm(gene ~ PCs), and the fitted model is used to predict the enhanced expression for each gene from the subspots' principal components.

Diagnostic measures, such as RMSE for xgboost or R.squared for linear regression, are added to the `rowData` of the enhanced experiment if the features are an assay of the original experiment. Otherwise they are stored as an attribute of the returned matrix/altExp.

Note that feature matrices will be returned and are expected to be input as \(p \times n\) matrices of \(p\)-dimensional feature vectors over the \(n\) spots.

Examples

set.seed(149) sce <- exampleSCE() sce <- spatialCluster(sce, 7, nrep=100, burn.in=10)
#> Neighbors were identified for 96 out of 96 spots.
#> Fitting model...
#> Calculating labels using iterations 10 through 100.
enhanced <- spatialEnhance(sce, 7, init=sce$spatial.cluster, nrep=100, burn.in=10)
#> Calculating labels using iterations 0 through 100.
enhanced <- enhanceFeatures(enhanced, sce, feature_names=c("gene_1", "gene_2"))