Normalization using VST

STEP9 Data processing

The first step in data processing is the process of normalization, in this case we are performing a log transformation of the data and scaling it to a factor of 10,000

# Normalize and identify variable features for each dataset independently
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
#OUTPUT
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

Now we want to select for most variable features in our dataset, or in this instance the top 2000 most variable features.

# select highly variable features
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
#OUTPUT
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)
top10
#OUTPUT
[1] "HBB"      "HBA2"     "HBA1"     "APOBEC3B" "CCL4"     "CCL7"     "CCL3"     "IGLL5"    "CXCL1"    "PTGDS"
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc, assay = "RNA")
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE, xnudge = 0, ynudge = 0)
plot1 + plot2
#OUTPUT
Warning messages:
1: Transformation introduced infinite values in continuous x-axis 
2: Transformation introduced infinite values in continuous x-axis

Next, we are going to perform an action that scales and centers features in the dataset.

# Scale data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes, verbose = TRUE)
#OUTPUT
Centering and scaling data matrix
  |===========================================================================================================| 100%