Differential testing framework

STEP13 Differential gene testing

Differential gene testing is a critical component of single-cell analysis. In the past people have used single-cell testing algorithms to look at differentially expressed genes, however, newer computational studies argue that pseduobulking counts across cell types and donors and then performing gene testing is perhaps more appropriate when discovering differentially expressed genes. To read more about this, review this paper by Squair et al., 2021:Nature Communications. However based on the design of the experiment and the cutoffs, both can yield reproducible data, at least for highly differential genes, for genes with high variance, perhaps pseudobulk methods are more reliable.

STEP13A: Single-cell gene testing

# In order to fine DE genes its important to annotate cells,
# this is where metadata comes to be important
# Our first method employs single cell differential gene expression using LR
# First lets save celltype info from active.idents into a metadata slot called celltype
pbmc$celltype <- pbmc@active.ident

# Lets check the clusters
table(pbmc$celltype)

# Now lets make a new metadata slot containing both peices of information, celltype and stimulation with interferon
# In order to do this we need to take data from two slots 1) pbmc$celltype and 2) pbmc$stim
# We will store this data in a new metadata slot called celltype.stim
pbmc$celltype.stim <- paste(pbmc$celltype, pbmc$stim, sep = "_")

# Now lets load this metadata in the active directory of seurat, so that seurat will pull this metadata by default
Idents(pbmc) <- "celltype.stim"

# Now lets run differential analysis
# HINT: Note that the ident.1 and ident.2 terms reside in pbmc$celltype.stim. maybe run table(pbmc$celltype.stim) to see a list of terms?
# If you cant complete all the steps to get here, thats okay! I have a seurat object called PBMC saved in a shared folder in a sub-folder called: \seuratobjects
# Just download that object and use qread or readRDS to load your object into R
b.interferon.response <- FindMarkers(pbmc, ident.1 = "B_STIM", ident.2 = "B_CTRL", # These are the two cell types/clusters that you want to run a DE test across. ident.2 is your control
                                     slot = 'data',
                                     test.use = "LR",
                                     min.pct = 0.1,
                                     latent.vars = c("donor"),
                                     logfc.threshold = 0.5849, #~1.5FC
                                     only.pos = TRUE,
                                     verbose = FALSE)
# Got the DE test to run for your project? WOOHOO GREAT JOB!!! I knew you could do it!!
# If you couldn't DON'T WORRY, create an issue on GitHub and I will get back to you asap!!
# Output
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
# Lets check head of data
head(b.interferon.response, n = 15)

We are now looking at the output of differentially expressed genes using the single cell statistical method

#OUTPUT
                p_val avg_log2FC pct.1 pct.2     p_val_adj
ISG15   1.886870e-251  4.5777825 0.998 0.244 2.651618e-247
ISG20   6.775871e-233  2.9286888 1.000 0.674 9.522132e-229
IFIT3   2.249518e-226  4.5012181 0.963 0.052 3.161247e-222
IFI6    7.675338e-222  4.2569268 0.965 0.077 1.078615e-217
IFIT1   1.235784e-197  4.1209722 0.908 0.032 1.736647e-193
MX1     1.447618e-160  3.2484413 0.908 0.116 2.034337e-156
TNFSF10 1.295544e-148  3.7724262 0.781 0.022 1.820627e-144
LY6E    1.309127e-148  3.1014142 0.894 0.151 1.839716e-144
IFIT2   4.554090e-142  3.6335959 0.786 0.037 6.399863e-138
B2M     2.530957e-119  0.6158652 1.000 1.000 3.556754e-115
CXCL10  4.799024e-113  5.3217240 0.644 0.010 6.744069e-109
PLSCR1  2.659608e-112  2.8015323 0.788 0.119 3.737548e-108
IRF7    3.667981e-108  2.5623663 0.837 0.193 5.154613e-104
HERC5    6.244788e-98  2.8136999 0.611 0.022  8.775800e-94
UBE2L6   7.032047e-91  2.1202128 0.857 0.301  9.882136e-87

Let’s look at the gene expression of the top 2 candidate proteins, we can see that they are interferon response genes.

b.interferon.response <- dplyr::filter(b.interferon.response, p_val_adj < 5e-2)
plots <- VlnPlot(pbmc, features = c("ISG15", "ISG20"), split.by = "stim", group.by = "celltype",
                 pt.size = 0, combine = FALSE)
#OUTPUT
The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.
# Run plotting function
wrap_plots(plots = plots, ncol = 1)
#OUTPUT

STEP13B: Pseudobulk testing

Pseudobulking is powerful. It allows one to remove the influence of library composition bias, as one aggregated read now represents the expression of one celltype across different donors. This analytical philosophy albeit more conservative, yields superior results with regards to reproducibility.

DefaultAssay(pbmc) <- "RNA"
pbmc$celltype.stim.donor <- paste(pbmc$celltype, pbmc$stim, pbmc$donor, sep = "_")
Idents(pbmc) <- "celltype.stim.donor"
table(pbmc[["celltype.stim.donor"]])
celltype.stim.donor
#OUTPUT
     B Activated_CTRL_d1      B Activated_CTRL_d2      B Activated_CTRL_d3      B Activated_CTRL_d4 
                      41                       46                       50                       46 
     B Activated_STIM_d1      B Activated_STIM_d2      B Activated_STIM_d3      B Activated_STIM_d4 
                      50                       66                       44                       47 
               B_CTRL_d1                B_CTRL_d2                B_CTRL_d3                B_CTRL_d4 
                     110                      109                       93                       93 
               B_STIM_d1                B_STIM_d2                B_STIM_d3                B_STIM_d4 
                     136                      141                      145                      143 
       CD14 Mono_CTRL_d1        CD14 Mono_CTRL_d2        CD14 Mono_CTRL_d3        CD14 Mono_CTRL_d4 
                     534                      582                      520                      573 
       CD14 Mono_STIM_d1        CD14 Mono_STIM_d2        CD14 Mono_STIM_d3        CD14 Mono_STIM_d4 
                     551                      535                      504                      524 
       CD16 Mono_CTRL_d1        CD16 Mono_CTRL_d2        CD16 Mono_CTRL_d3        CD16 Mono_CTRL_d4 
                     125                      124                      126                      143 
       CD16 Mono_STIM_d1        CD16 Mono_STIM_d2        CD16 Mono_STIM_d3        CD16 Mono_STIM_d4 
                     142                      141                      127                      136 
    CD4 Memory T_CTRL_d1     CD4 Memory T_CTRL_d2     CD4 Memory T_CTRL_d3     CD4 Memory T_CTRL_d4 
                     219                      200                      208                      186 
    CD4 Memory T_STIM_d1     CD4 Memory T_STIM_d2     CD4 Memory T_STIM_d3     CD4 Memory T_STIM_d4 
                     229                      197                      238                      239 
     CD4 Naive T_CTRL_d1      CD4 Naive T_CTRL_d2      CD4 Naive T_CTRL_d3      CD4 Naive T_CTRL_d4 
                     264                      230                      253                      256 
     CD4 Naive T_STIM_d1      CD4 Naive T_STIM_d2      CD4 Naive T_STIM_d3      CD4 Naive T_STIM_d4 
                     379                      367                      346                      383 
           CD8 T_CTRL_d1            CD8 T_CTRL_d2            CD8 T_CTRL_d3            CD8 T_CTRL_d4 
                      71                       72                       82                       95 
           CD8 T_STIM_d1            CD8 T_STIM_d2            CD8 T_STIM_d3            CD8 T_STIM_d4 
                     129                      119                      101                      117 
              DC_CTRL_d1               DC_CTRL_d2               DC_CTRL_d3               DC_CTRL_d4 
                      65                       61                       52                       48 
              DC_STIM_d1               DC_STIM_d2               DC_STIM_d3               DC_STIM_d4 
                      48                       39                       50                       57 
           Eryth_CTRL_d1            Eryth_CTRL_d2            Eryth_CTRL_d3            Eryth_CTRL_d4 
                       5                        5                        6                        6 
           Eryth_STIM_d1            Eryth_STIM_d2            Eryth_STIM_d3            Eryth_STIM_d4 
                       7                        9                       10                        7 
              Mk_CTRL_d1               Mk_CTRL_d2               Mk_CTRL_d3               Mk_CTRL_d4 
                      21                       30                       26                       21 
              Mk_STIM_d1               Mk_STIM_d2               Mk_STIM_d3               Mk_STIM_d4 
                      29                       28                       37                       28 
Mono/Mk Doublets_CTRL_d1 Mono/Mk Doublets_CTRL_d2 Mono/Mk Doublets_CTRL_d3 Mono/Mk Doublets_CTRL_d4 
                      17                        5                        9                       11 
Mono/Mk Doublets_STIM_d1 Mono/Mk Doublets_STIM_d2 Mono/Mk Doublets_STIM_d3 Mono/Mk Doublets_STIM_d4 
                       4                        7                       11                        6 
              NK_CTRL_d1               NK_CTRL_d2               NK_CTRL_d3               NK_CTRL_d4 
                      82                       63                       77                       90 
              NK_STIM_d1               NK_STIM_d2               NK_STIM_d3               NK_STIM_d4 
                      75                       85                       86                       87 
             pDC_CTRL_d1              pDC_CTRL_d2              pDC_CTRL_d3              pDC_CTRL_d4 
                      10                       15                        9                       17 
             pDC_STIM_d1              pDC_STIM_d2              pDC_STIM_d3              pDC_STIM_d4 
                      13                       23                       16                       25 
     T activated_CTRL_d1      T activated_CTRL_d2      T activated_CTRL_d3      T activated_CTRL_d4 
                      83                       79                       84                       69 
     T activated_STIM_d1      T activated_STIM_d2      T activated_STIM_d3      T activated_STIM_d4 
                      92                       91                       83                       77
combined_pbmc <- AggregateExpression(pbmc, 
                                     assays = c("RNA"), 
                                     features = NULL, return.seurat = TRUE,  
                                     group.by = "celltype.stim.donor",
                                     slot = "counts", verbose = FALSE)

combined_pbmc$celltype.stim.donor <- Cells(combined_pbmc)

# Metadata organization and addition to aggregated object
{
  Idents(combined_pbmc) <- 'celltype.stim.donor'
  combined_pbmc$celltype <- combined_pbmc@meta.data[["orig.ident"]]
  metadat <- combined_pbmc@meta.data
  metadat$celltype <- metadat[c('celltype')] <- str_split_i(metadat$celltype.stim.donor, "_", -3)
  metadat$stim <- metadat[c('stim')] <- str_split_i(metadat$celltype.stim.donor, '_', -2)
  metadat$donor <- metadat[c('donor')] <- str_split_i(metadat$celltype.stim.donor, '_', -1)
  combined_pbmc@meta.data = metadat
}

table(combined_pbmc@meta.data[["celltype"]])
#OUTPUT
               B      B Activated        CD14 Mono        CD16 Mono     CD4 Memory T      CD4 Naive T 
               8                8                8                8                8                8 
           CD8 T               DC            Eryth               Mk Mono/Mk Doublets               NK 
               8                8                8                8                8                8 
             pDC      T activated 
               8                8
table(combined_pbmc@meta.data[["donor"]])
#OUTPUT
d1 d2 d3 d4 
28 28 28 28
table(combined_pbmc@meta.data[["celltype.stim.donor"]])
#OUTPUT
     B Activated_CTRL_d1      B Activated_CTRL_d2      B Activated_CTRL_d3      B Activated_CTRL_d4 
                       1                        1                        1                        1 
     B Activated_STIM_d1      B Activated_STIM_d2      B Activated_STIM_d3      B Activated_STIM_d4 
                       1                        1                        1                        1 
               B_CTRL_d1                B_CTRL_d2                B_CTRL_d3                B_CTRL_d4 
                       1                        1                        1                        1 
               B_STIM_d1                B_STIM_d2                B_STIM_d3                B_STIM_d4 
                       1                        1                        1                        1 
       CD14 Mono_CTRL_d1        CD14 Mono_CTRL_d2        CD14 Mono_CTRL_d3        CD14 Mono_CTRL_d4 
                       1                        1                        1                        1 
       CD14 Mono_STIM_d1        CD14 Mono_STIM_d2        CD14 Mono_STIM_d3        CD14 Mono_STIM_d4 
                       1                        1                        1                        1 
       CD16 Mono_CTRL_d1        CD16 Mono_CTRL_d2        CD16 Mono_CTRL_d3        CD16 Mono_CTRL_d4 
                       1                        1                        1                        1 
       CD16 Mono_STIM_d1        CD16 Mono_STIM_d2        CD16 Mono_STIM_d3        CD16 Mono_STIM_d4 
                       1                        1                        1                        1 
    CD4 Memory T_CTRL_d1     CD4 Memory T_CTRL_d2     CD4 Memory T_CTRL_d3     CD4 Memory T_CTRL_d4 
                       1                        1                        1                        1 
    CD4 Memory T_STIM_d1     CD4 Memory T_STIM_d2     CD4 Memory T_STIM_d3     CD4 Memory T_STIM_d4 
                       1                        1                        1                        1 
     CD4 Naive T_CTRL_d1      CD4 Naive T_CTRL_d2      CD4 Naive T_CTRL_d3      CD4 Naive T_CTRL_d4 
                       1                        1                        1                        1 
     CD4 Naive T_STIM_d1      CD4 Naive T_STIM_d2      CD4 Naive T_STIM_d3      CD4 Naive T_STIM_d4 
                       1                        1                        1                        1 
           CD8 T_CTRL_d1            CD8 T_CTRL_d2            CD8 T_CTRL_d3            CD8 T_CTRL_d4 
                       1                        1                        1                        1 
           CD8 T_STIM_d1            CD8 T_STIM_d2            CD8 T_STIM_d3            CD8 T_STIM_d4 
                       1                        1                        1                        1 
              DC_CTRL_d1               DC_CTRL_d2               DC_CTRL_d3               DC_CTRL_d4 
                       1                        1                        1                        1 
              DC_STIM_d1               DC_STIM_d2               DC_STIM_d3               DC_STIM_d4 
                       1                        1                        1                        1 
           Eryth_CTRL_d1            Eryth_CTRL_d2            Eryth_CTRL_d3            Eryth_CTRL_d4 
                       1                        1                        1                        1 
           Eryth_STIM_d1            Eryth_STIM_d2            Eryth_STIM_d3            Eryth_STIM_d4 
                       1                        1                        1                        1 
              Mk_CTRL_d1               Mk_CTRL_d2               Mk_CTRL_d3               Mk_CTRL_d4 
                       1                        1                        1                        1 
              Mk_STIM_d1               Mk_STIM_d2               Mk_STIM_d3               Mk_STIM_d4 
                       1                        1                        1                        1 
Mono/Mk Doublets_CTRL_d1 Mono/Mk Doublets_CTRL_d2 Mono/Mk Doublets_CTRL_d3 Mono/Mk Doublets_CTRL_d4 
                       1                        1                        1                        1 
Mono/Mk Doublets_STIM_d1 Mono/Mk Doublets_STIM_d2 Mono/Mk Doublets_STIM_d3 Mono/Mk Doublets_STIM_d4 
                       1                        1                        1                        1 
              NK_CTRL_d1               NK_CTRL_d2               NK_CTRL_d3               NK_CTRL_d4 
                       1                        1                        1                        1 
              NK_STIM_d1               NK_STIM_d2               NK_STIM_d3               NK_STIM_d4 
                       1                        1                        1                        1 
             pDC_CTRL_d1              pDC_CTRL_d2              pDC_CTRL_d3              pDC_CTRL_d4 
                       1                        1                        1                        1 
             pDC_STIM_d1              pDC_STIM_d2              pDC_STIM_d3              pDC_STIM_d4 
                       1                        1                        1                        1 
     T activated_CTRL_d1      T activated_CTRL_d2      T activated_CTRL_d3      T activated_CTRL_d4 
                       1                        1                        1                        1 
     T activated_STIM_d1      T activated_STIM_d2      T activated_STIM_d3      T activated_STIM_d4 
                       1                        1                        1                        1
combined_pbmc$celltype.stim <- paste(combined_pbmc$celltype, combined_pbmc$stim, sep = "_")
table(combined_pbmc@meta.data[["celltype.stim"]])
#OUTPUT
     B Activated_CTRL      B Activated_STIM                B_CTRL                B_STIM        CD14 Mono_CTRL 
                    4                     4                     4                     4                     4 
       CD14 Mono_STIM        CD16 Mono_CTRL        CD16 Mono_STIM     CD4 Memory T_CTRL     CD4 Memory T_STIM 
                    4                     4                     4                     4                     4 
     CD4 Naive T_CTRL      CD4 Naive T_STIM            CD8 T_CTRL            CD8 T_STIM               DC_CTRL 
                    4                     4                     4                     4                     4 
              DC_STIM            Eryth_CTRL            Eryth_STIM               Mk_CTRL               Mk_STIM 
                    4                     4                     4                     4                     4 
Mono/Mk Doublets_CTRL Mono/Mk Doublets_STIM               NK_CTRL               NK_STIM              pDC_CTRL 
                    4                     4                     4                     4                     4 
             pDC_STIM      T activated_CTRL      T activated_STIM 
                    4                     4                     4
Idents(combined_pbmc) <- "celltype.stim"
b.interferon.response.aggr <- FindMarkers(combined_pbmc, ident.1 = "B_STIM", ident.2 = "B_CTRL",
                                          slot = 'data',
                                          test.use = "DESeq2",
                                          min.pct = 0.1,
                                          latent.vars = c("donor"),
                                          logfc.threshold = 0.5849, #~1.5FC
                                          only.pos = TRUE,
                                          verbose = FALSE)
Warning: 'latent.vars' is only used for the following tests: negbinom, poisson, MAST, LR
converting counts to integer mode
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

head(b.interferon.response.aggr, n = 15)
#OUTPUT
               p_val avg_log2FC pct.1 pct.2     p_val_adj
ISG15   0.000000e+00   5.458484     1   1.0  0.000000e+00
ISG20   0.000000e+00   3.548164     1   1.0  0.000000e+00
IFI6   1.318852e-111   5.919998     1   1.0 1.853383e-107
IFIT3   1.038933e-98   6.333155     1   1.0  1.460013e-94
LY6E    6.543563e-98   4.196682     1   1.0  9.195669e-94
MX1     1.404512e-97   4.453682     1   1.0  1.973761e-93
B2M     1.505428e-79   1.096096     1   1.0  2.115578e-75
IFIT1   7.894356e-69   6.282988     1   1.0  1.109394e-64
IRF7    3.917788e-68   3.564378     1   1.0  5.505668e-64
CXCL10  6.061481e-61   8.071050     1   0.5  8.518199e-57
IFIT2   6.308882e-61   5.546609     1   1.0  8.865872e-57
PLSCR1  2.569268e-60   4.125013     1   1.0  3.610593e-56
UBE2L6  8.745682e-60   2.935460     1   1.0  1.229031e-55
IFITM2  1.077820e-55   3.460921     1   1.0  1.514660e-51
SAT1    1.858432e-55   2.519988     1   1.0  2.611655e-51

Notice how some adj pvalues are 0? Thats because the value is lower than the smallest calculable value of R, we can see what is the smallest value R can calculate using the following function. Remember this will change based on the hardware you are using.

# Why is my adj pval 0?
.Machine$double.xmin
#OUTPUT
[1] 2.225074e-308
# Replace 0 with low pval, subset for 0.05 FDR and FC > 1.5
b.interferon.response.aggr$p_val_adj[b.interferon.response.aggr$p_val_adj == 0] <- 2e-302
b.interferon.response.aggr <- dplyr::filter(b.interferon.response.aggr, p_val_adj < 5e-2)
b.interferon.response.aggr <- dplyr::filter(b.interferon.response.aggr, avg_log2FC > 0.5849)

intersect(rownames(b.interferon.response), rownames(b.interferon.response.aggr))
#OUTPUT
  [1] "ISG15"     "ISG20"     "IFIT3"     "IFI6"      "IFIT1"     "MX1"       "TNFSF10"   "LY6E"      "IFIT2"    
 [10] "B2M"       "CXCL10"    "PLSCR1"    "IRF7"      "HERC5"     "UBE2L6"    "IFI44L"    "EPSTI1"    "OAS1"     
 [19] "GBP1"      "IFITM2"    "SAMD9L"    "NT5C3A"    "IFI35"     "PSMB9"     "MX2"       "DYNLT1"    "BST2"     
 [28] "IFITM3"    "CMPK2"     "SAT1"      "EIF2AK2"   "PPM1K"     "GBP4"      "DDX58"     "PSMA2.1"   "LAP3"     
 [37] "SAMD9"     "XAF1"      "IFI16"     "COX5A"     "SOCS1"     "MYL12A"    "SP110"     "PARP14"    "PSME2"    
 [46] "TMSB10"    "CHST12"    "FBXO6"     "MT2A"      "PLAC8"     "TRIM22"    "DRAP1"     "SUB1"      "TNFSF13B" 
 [55] "NMI"       "XRN1"      "NEXN"      "RBCK1"     "CLEC2D"    "MNDA"      "RNF213"    "IFI44"     "GBP5"     
 [64] "NPC2"      "STAT1"     "WARS"      "OAS2"      "SELL"      "TAP1"      "DDX60L"    "IRF8"      "OAS3"     
 [73] "RTCB"      "IFITM1"    "KIAA0040"  "CXCL11"    "CARD16"    "PSMA4"     "DNAJA1"    "IFIH1"     "TYMP"     
 [82] "HLA-E"     "LGALS9"    "NUB1"      "C19orf66"  "GBP2"      "PSMB8"     "GNG5"      "HAPLN3"    "PMAIP1"   
 [91] "IFIT5"     "PARP9"     "CD38"      "GMPR"      "C5orf56"   "EAF2"      "HERC6"     "CD48"      "RTP4"     
[100] "RABGAP1L"  "USP30-AS1" "TREX1"     "IGFBP4"    "INPP1"     "CCL8"      "CREM"      "CD164"     "CLIC1"    
[109] "APOL6"     "CCL2"      "SMCHD1"    "ODF2L"     "EHD4"      "NAPA"      "SP100"     "PHF11"     "FAM46A"   
[118] "PNPT1"     "ADAR"      "POMP"      "UNC93B1"   "DCK"       "IRF1"      "CCR7"      "TMEM123"   "CASP4"    
[127] "HSH2D"     "SLFN5"     "CFLAR"     "CASP1"     "VAMP5"     "PSME1"     "XBP1"      "MRPL44"