Predict feature vectors from enhanced PCs.

enhanceFeatures(
  sce.enhanced,
  sce.ref,
  feature_names = NULL,
  model = c("xgboost", "dirichlet", "lm"),
  use.dimred = "PCA",
  assay.type = "logcounts",
  altExp.type = NULL,
  feature.matrix = NULL,
  nrounds = 0,
  train.n = round(ncol(sce.ref) * 2/3)
)

Arguments

sce.enhanced	SingleCellExperiment object with enhanced PCs.
sce.ref	SingleCellExperiment object with original PCs and expression.
feature_names	List of genes/features to predict expression/values for.
model	Model used to predict enhanced values.
use.dimred	Name of dimension reduction to use.
assay.type	Expression matrix in `assays(sce.ref)` to predict.
altExp.type	Expression matrix in `altExps(sce.ref)` to predict. Overrides `assay.type` if specified.
feature.matrix	Expression/feature matrix to predict, if not directly attached to `sce.ref`. Must have columns corresponding to the spots in `sce.ref`. Overrides `assay.type` and `altExp.type` if specified.
nrounds	Nonnegative integer to set the `nrounds` parameter (max number of boosting iterations) for xgboost. `nrounds = 100` works reasonably well in most cases. If `nrounds` is set to 0, the parameter will be tuned using a train-test split. We recommend tuning `nrounds` for improved feature prediction, but note this will increase runtime.
train.n	Number of spots to use in the training dataset for tuning nrounds. By default, 2/3 the total number of spots are used.

Value

If assay.type or altExp.type are specified, the enhanced features are stored in the corresponding slot of sce.enhanced and the modified SingleCellExperiment object is returned.

If feature.matrix is specified, or if a subset of features are requested, the enhanced features are returned directly as a matrix.

Details

Enhanced features are computed by fitting a predictive model to a low-dimensional representation of the original expression vectors. By default, a linear model is fit for each gene using the top 15 principal components from each spot, i.e. lm(gene ~ PCs), and the fitted model is used to predict the enhanced expression for each gene from the subspots' principal components.

Diagnostic measures, such as RMSE for xgboost or R.squared for linear regression, are added to the `rowData` of the enhanced experiment if the features are an assay of the original experiment. Otherwise they are stored as an attribute of the returned matrix/altExp.

Note that feature matrices will be returned and are expected to be input as \(p \times n\) matrices of \(p\)-dimensional feature vectors over the \(n\) spots.

Examples

set.seed(149)
sce <- exampleSCE()
sce <- spatialCluster(sce, 7, nrep=100, burn.in=10)
#> Neighbors were identified for 96 out of 96 spots.
#> Fitting model...
#> Calculating labels using iterations 10 through 100.
enhanced <- spatialEnhance(sce, 7, init=sce$spatial.cluster, nrep=100, burn.in=10)
#> Calculating labels using iterations 0 through 100.
enhanced <- enhanceFeatures(enhanced, sce, feature_names=c("gene_1", "gene_2"))