6 Table

Table1

A. Deep Learning algorithms reviewed in the paper

Table 6.1: Deep Learning algorithms reviewed in the paper
App	Algorithm	Models	Evaluation	Environment	Codes	Refs
Imputation
	DCA	AE	DREMI	Keras, Tensorflow, scanpy	https://github.com/theislab/dca	(Arisdakessian et al. 2019)
	SAVER-X	AE+TL	t-SNE, ARI	R/sctransfer	https://github.com/jingshuw/SAVERX	(Borgwardt et al. 2006)
	DeepImpute	DNN	MSE, Pearson’s correlation	Keras/Tensorflow	https://github.com/lanagarmire/DeepImpute	(Petegrosso, Li, and Kuang 2020)
	LATE	AE	MSE	Tensorflow	https://github.com/audreyqyfu/LATE	(Buttner et al. 2019)
	scGAMI	AE	NMI, ARI, HS and CS	Tensorflow	https://github.com/QUST-AIBBDRC/scGMAI/	(Cover 1999)
	scIGANs	GAN	ARI, ACC, AUC, and F-score	PyTorch	https://github.com/xuyungang/scIGANs	(Tran et al. 2020)
Batch correction
	BERMUDA	AE+TL	kBET, entropy of Mixing, SI	PyTorch	https://github.com/txWang/BERMUDA	(Badsha et al. 2020)
	DESC	AE	ARI, KL	Tensorflow	https://github.com/eleozzr/desc	(T. Wang et al. 2019)
	iMAP	AE+GAN	kBET, LISI	PyTorch	https://github.com/Svvord/iMAP	(X. Li et al. 2020)
Clustering, latent representation, dimension reduction, and data augmentation
	Dhaka	VAE	ARI, Spearman Correlation	Keras/Tensorflow	https://github.com/MicrosoftGenomics/Dhaka	(Hie, Bryson, and Berger 2019)
	scvis	VAE	KNN preservation, log-likelihood	Tensorflow	https://bitbucket.org/jerry00/scvis-dev/src/master/	(Fowlkes and Mallows 1983)
	scVAE	VAE	ARI	Tensorflow	https://github.com/scvae/scvae	(Rashid et al. 2019)
	VASC	VAE	NMI, ARI, HS, and CS	H5py, Keras	https://github.com/wang-research/VASC	(Tirosh, Izar, et al. 2016)
	scDeepCluster	AE	ARI, NMI, clustering accuracy	Keras, Scanpy	https://github.com/ttgump/scDeepCluster	(Ding, Condon, and Shah 2018)
	cscGAN	GAN	t-SNE, marker genes, MMD, AUC	Scipy, Tensorflow	https://github.com/imsb-uke/scGAN	(D. Wang and Gu 2018)
Multi-functional models (IM: imputation, BC: batch correction, CL: clustering)
	scVI	VAE	IM: L1 distance; CL: ARI, NMI, SI; BC: Entropy of Mixing	PyTorch, Anndata	https://github.com/YosefLab/scvi-tools	(Y. Xu et al. 2020)
	LDVAE	VAE	Reconstruction errors	Part of scVI	https://github.com/YosefLab/scvi-tools	(Xie, Girshick, and Farhadi, n.d.)
	SAUCIE	AE	IM: R2 statistics; CL: SI; BC: modified kBET; Visualization: Precision/Recall	Tensorflow	https://github.com/KrishnaswamyLab/SAUCIE/	(Amodio et al. 2019)
	scScope	AE	IM:Reconstruction errors; BC: Entropy of mixing; CL: ARI	Tensorflow, Scikit-learn	https://github.com/AltschulerWu-Lab/scScope	(Lindenbaum and Krishnaswamy 2018)
Cell type Identification
	DigitalDLSorter	DNN	Pearson correlation	R/Python/Keras	https://github.com/cartof/digitalDLSorter	(Svensson et al. 2020)
	scCapsNet	CapsNet	Cell-type Prediction accuracy	Keras, Tensorflow	https://github.com/wanglf19/scCaps	(Wolock, Lopez, and Klein 2019)
	netAE	VAE	Cell-type Prediction accuracy, t-SNE for visualization	PyTorch	https://github.com/LeoZDong/netAE	(H. Li et al. 2017)
	scDGN	DANN	Prediciton accuracy	PyTorch	https://github.com/SongweiGe/scDGN	(Racle et al. 2017)
Function analysis
	CNNC	CNN	AUROC, AUPRC, and accuracy	Keras, Tensorflow	https://github.com/xiaoyeye/CNNC	(N. D. Patel, Nguang, and Coghill 2007)
	scGen	VAE	Correlation, visualization	Tensorflow	https://github.com/theislab/scgen	(Yuan and Bar-Joseph 2019)
DL Model keywords: AE: autoencoder, AE+TL: autoencoder with transfer learning, AE: variational autoencoder, GAN: Generative adversarial network, CNN: convolutional neural network, DNN: deep neural network, DANN: domain adversarial neural network, CapsNet: capsule neural network

B. Comparison of Deep Learning algorithms reviewed in the paper

Table 6.2: Comparison of Deep Learning algorithms reviewed in the paper
App	Algorithm	Feature	ApplicationNotes
Imputation
	DCA	Strongest recovery of the top 500 genes	AE integrated into the Scanpy framework
		Choices of noise models, including NB, and ZINB	High scalability of AE, up to millions of cells
		Outperform other existing methods in capturing cell population structure	This method was compared to SAVER, scImpute, and MAGIC
	SAVER-X	Pretraining from existing data sets (transfer learning)	SAVER-X pretraining on PBMCs outperformed other denoising methods, including DCA, scVI, scImpute, and MAGIC
		Decomposes the variation into three components	SAVER-X was also applied for cross-species analysis
	DeepImpute	Divide-and-conquer approach, using a bank of DNN models	DeepImpute had the highest overall accuracy and offered shorter computation time than other methods like MAGIC, DrImpute, ScImpute, SAVER, VIPER, and DCA
		Reduced complexity by learning smaller sub-network	DeepImpute showed benefits in improving clustering results and identifying significantly differentially expressed genes
		Minimized overfitting by removing target genes from input	Scalable and faster training time
	LATE	Takes the log-transformed expression as input	LATE outperforms other existing methods like MAGIC, SAVER, DCA, scVI, particularly when the ground truth contains only a few or no zeros
		No explicit distribution assumption on input data	Better scalability than DCA and scVI up to 1.3 million cells with 10K genes
	scGAMI	A model designed for clustering but it includes an AE	Significantly improved the clustering performance in eight of seventeen selected scRNA-seq datasets
		Uses fast independent component analysis algorithm: FastICA	scGMI’s scalability needs to be improved
	scIGANs	Trains a GAN model to generate samples with imputed expressions	This framework forces the model to reconstruct the real samples and discriminate between real and generated samples.
			Best reported performance in clustering compared to DCA, DeepImpute, SAVER, scImpute, MAGIC
			Scalable over 100K cells, also robust to small datasets
Batch correction
	BERMUDA	Preserves batch-specific biological signals through transfer-learning Preserves batch-specific cell populations	Outperform other methods like mnnCorrect, BBKNN, Seurat, and scVI
			Removes batch effects even when the cell population compositions across different batches are vastly different
			Scalable by using mini-batch gradient descent algorithm during training
	DESC	Removes batch effect through clustering with the hypothesis that batch differences in expressions are smaller than true biological variations	DESC is effective in removing the batch effect, whereas CCA, MNN, Seurat 3.0, scVI, BERMUDA, and scanorama were sensitive to batch definitions
		Does not require explicit batch information for batch removal	DESC is biologically interpretable and can reveal both discrete and pseudo-temporal structures of cells
			Small memory footprint and GPU enabled
	iMAP	iMAP combines AE and GAN for batch effect removal	iMAP was shown to separate the batch-specific cell types but mix batch shared cell types and outperformed other existing batch correction methods including Harmony, scVI, fastMNN, Seurat
		It consists of two processing stages, each including a separate DL model	Capable handling datasets from Smart-seq2 and 10X Genomics platforms
			Demonstrated the stability over hyperparameters, and scalability for thousands of cells.
Clustering, latent representation, dimension reduction, and data augmentation
	Dhaka	It was proposed to reduce the dimension of scRNA-seq data for efficient stratification of tumor subpopulations	Dhaka was shown to have an ARI higher than most other comparing methods including t-SNE, PCA, SIMLR, NMF, an autoencoder, MAGIC, and scVI
			Dhaka can represent an evolutionary trajectory
	scvis	VAE network that learns low-dimensional representations	scvis was tested on the simulated data and outperformed t-SNE
		Capture both local and global neighboring structures	scvis is much more scalable than BH t-SNE (t-SNE takes O(M log(M)) time and O(M log(M)) space) for very large dataset (>1 million cells)
	scVAE	scVAE includes multiple VAE models for denoising gene expression levels and learning the low-dimensional latent representation	GMVAE was also compared with Seurat and shown to perform better, however, scVAE performed no better than scVI or scvis
		Gaussian Mixture VAE (GMVAE) with negative binomial distribution achieved the highest lower bound and ARI	Algorithm applicable to large datasets with million cells
	VASC	Another VAE for dimension reduction and latent representation	VASC was compared with PCA, t-SNE, ZIFA, and SIMLR on 20 datasets
		Explicitly model dropout with a Zero-inflated layer	In the study of embryonic development from zygote to blast cells, VASC shthe owed better performance to model embryo developmental progression
			VASC is reported to handle a large number of cells or cell types
			Demonstrated model stability
	scDeepCluster	AE network that simultaneously learns feature representation and performs clustering via explicit modeling of cell clusters	It was tested on the simulation data with different dropout rates and compared with DCA, MPSSC and SIMLR CIDR, PCA + k-mean, scvis and DEC significantly outperforming them
			Running time of scDeepCluster scales linearly with the number of cells, suitable for large scRNA-seq datasets
	cscGAN	It was designed to augment the existing scRNA-seq samples by generating expression profiles of specific cell types or subpopulations	cscGAN was shown to generate high-quality scRAN-seq data for specific cell types.
		The cscGAN learns the expression patterns of a specific subpopulation (few cells), and simultaneously learns the cells from all populations (a large number of cells).	The augmentation in this method improved the identification of rare cell types and the ability to capture transitional cell states from trajectory analysis
			Better scalability than SUGAR (Synthesis Using Geometrically Aligned Random-walks)
			Capable re-establish developmental trajectories through pseudo-time analysis via cscGAN data augmentation
Multi-functional models (IM: imputation, BC: batch correction, CL: clustering)
	scVI	Designed to address a range of fundamental analysis tasks, including batch correction, visualization, clustering, and differential expression	ScVI was shown to be faster than most non-DL algorithms and scalable to handle twice as many cells as non-DL algorithms with a fixed memory
		Integrated a normalization procedure and batch correction	For imputation, ScVI, together with other ZINB-based models, performed better than methods using alternative distributions
			Similar scalability as DCA
	LDVAE	Adaption of scVI to improve the model interpretability	For LDVAE the variations along the different axes of the latent variable establish direct linear relationships with input genes.
	SAUCIE	It is applied to the normalized data instead of count data	Results showed that SAUCIE had a better or comparable performance with other approaches
			SAUCIE has better scalability and faster runtimes than any of the other models
			Applications with single-cell CyTOF datasets
	scScope	AE with recurrent steps designed for imputation and batch correction	It was compared with PCA, MAGIC, ZINB-WaVE, SIMLR, AE, scVI, and DCA
			Efficiently identify cell subpopulations from complex datasets with high dropout rates, large numbers of subpopulations and rare cell types
			For scalability and training speed, scScope was shown to offer scalability (for >100K cells) with high efficiency (faster than most of the approaches)
Cell type Identification
	DigitalDLSorter	A deconvolution model with 4-layer DNN	DigitalDLSorter achieved excellent agreement (linear correlation of 0.99 for colorectal cancer, and good agreement in quadratic relationship for breast cancer) at predicting cell type proportion.
		Designed to identify and quantify the immune cells infiltrated in tumors captured in bulk RNA-seq, utilizing single-cell RNA-seq data
	scCapsNet	It takes log-transformed, normalized expressions as input and follows the general CapsNet model	Interpretable capsule network designed for cell type prediction
			scCapsNet makes the deep-learning black box transparent through the direct interpretation of internal parameters
	netAE	VAE-based semi-supervised cell type prediction model	Deals with scenarios of having a small number of labeled cells.
		Aiming at learning a low dimensional space from which the original space can be accurately reconstructed	netAE outperformed most of the baseline methods including scVI, ZIFA, PCA and AE as well as a semi-supervised method scANVI
	scDGN	scDGN takes the log-transformed, normalized expression as the input	scDGN was tested for classifying cell types and aligning batches
		Supervised domain adversarial network	scDGN outperformed many deep learning and traditional machine learning methods in classification accuracy, including DNN, CaSTLe, MNN, scVI, and Seurat
Function analysis
	CNNC	CNNC takes expression levels of two genes from many cells and transforms them into a 32 x 32 image-like normalized empirical probability function	CNNC outperforms prior methods for inferring TF–gene and protein–protein interactions, causality inference, and functional assignments
		Inferring causal interactions between genes from scRNA-seq	Was shown to have more than 20% higher AUPRC than other methods and reported almost no false-negative for the top 5% predictions
	scGen	ScGen follows the general VAE for scRNA-seq data but uses the “latent space arithmetics” to learn perturbations’ response	Compared with other methods including CVAE, style transfer GAN, linear approaches based on vector arithmetics(VA) and PCA+VA, scGen predicted full distribution of ISG15 gene (strongest regulated gene by IFN-b) response to IFN- b
		Designed to learn cell response to certain perturbation (drug treatment, gene knockout, etc)	scGen can be used to translate the effect of a stimulation trained in study A to how stimulated cells would look in study B, given a control sample set
Abbreviation: NB: negative binomial distribution; ZINB: zero-inflated negative binomial distribution; TF: Transcription factor;

Table2

A. Simulated single-cell data/algorithms

Table 6.3: Simulated single-cell data/algorithms
Title	Algorithm	Number_of_Cells	Simulation_Methods	Refs
Splatter	DCA, DeepImpute, PERMUDA, scDeepCluster, scVI, scScope, solo	~2000	Splatter/R	(Tian 2019)
CIDR	sclGAN	50	CIDR simulation	[54, Reference Not Found]
NB+dropout	Dhaka	500	Hierachical model of NB/Gamma + random dropout
Bulk RNA-seq	SAUCIE	1076	1076 CCLE bulk RNAseq + dropout conditional on expression level
SIMLR	scScope	1 million	SIMLR, high-dimensional data generated from latent vector	(Miao et al. 2018)

B. Human single-cell data sources used by different DL algorithms

Table 6.4: Human single-cell data sources used by different DL algorithms
Title	Algorithm	Cell_Origin	Number_of_Cells	Data_Sources	Refs
68k PBMCs	DCA, SAVER-X, LATE, scVAE, scDeepCluster, scCapsNet, scDGN	Blood	68,579	10X Single Cell Gene Expression Datasets
Human pluripotent	DCA	hESCs	1,876	GSE102176	(Lotfollahi, Wolf, and Theis 2019)
CITE-seq	SAVER-X	Cord blood mononuclear cells	8,005	GSE100866	(Duvenaud 2015)
Midbrain and Dopaminergic Neuron Development	SAVER-X	Brain/ embryo ventral midbrain cells	1,977	GSE76381	[124, Ref Not Found]
HCA	SAVER-X	Immune cell, Human Cell Atlas	500,000	HCA data portal
Breast tumor	SAVER-X	Immune cell in tumor micro-environment	45,000	GSE114725	(Kang et al. 2018)
293T cells	DeepImpute, iMAP	Embryonic kidney	13,480	10X Single Cell Gene Expression Datasets
Jurkat	DeepImpute, iMAP	Blood/ lymphocyte	3,200	10X Single Cell Gene Expression Datasets
ESC, Time-course	scGAN	ESC	350,758	GSE75748	(Haber et al. 2017)
Baron-Hum-1	scGMAI, VASC	Pancreatic islets	1,937	GSM2230757	(Hagai et al. 2018)
Baron-Hum-2	scGMAI, VASC	Pancreatic islets	1,724	GSM2230758	(Hagai et al. 2018)
Camp	scGMAI, VASC	Liver cells	303	GSE96981	(Y. Peng et al. 2018)
CEL-seq2	PERMUDA, DESC	Pancreas/Islets of Langerhans		GSE85241	(Stoeckius et al. 2017)
Darmanis	scGMAI, sclGAN, VASC	Brain/cortex	466	GSE67835	(Azizi et al. 2018)
Tirosh-brain	Dhaka, scvis	Oligodendroglioma	>4,800	GSE70630	(Chu et al. 2016)
Patel	Dhaka	Primary glioblastoma cells	875	GSE57872	(210?)
Li	scGMAI, VASC	Blood	561	GSE146974	(T. Wang et al. 2019)
Tirosh-skin	scvis	melanoma	4,645	GSE72056	(D. Wang et al. 2021)
xenograft 3, and 4	Dhaka	Breast tumor	~250	EGAS00001002170	(Camp et al. 2017)
Petropoulos	VASC/netAE	Human embryos	1,529	E-MTAB-3929
Pollen	scGMAI, VASC		348	SRP041736	(Muraro et al. 2016)
Xin	scGMAI, VASC	Pancreatic cells (a-, ß-, d-)	1,600	GSE81608	(Darmanis et al. 2015)
Yan	scGMAI, VASC	embryonic stem cells	124	GSE36552	(Tirosh, Venteicher, et al. 2016)
PBMC3k	VASC, scVI	Blood	2,700	SRP073767	(Torroja and Sanchez-Cabo 2019)
CyTOF, Dengue	SAUCIE	Dengue infection	11 M, ~42 antibodies	Cytobank, 82023	(Amodio et al. 2019)
CyTOF, ccRCC	SAUCIE	Immunue profile of 73 ccRCC patients	3.5 M, ~40 antibodies	Cytobank: 875	(A. P. Patel et al. 2014)
CyTOF, breast	SAUCIE	3 patients		Flow Repository: FR-FCM-ZYJP	(Kang et al. 2018)
Chung, BC	DigitalDLSorter	Breast tumor	515	GSE75688	(Levine et al. 2015)
Li, CRC	DigitalDLSorter	Colorectal cancer	2,591	GSE81861	(Qiu et al. 2017)
Pancreatic datasets	scDGN	Pancreas	14,693	SeuratData
Kang, PBMC	scGen	PBMC stimulated by INF-ß	~15,000	GSE96583	(Y. X. Wang, Waterman, and Huang 2014)

C. Mouse single-cell data sources used by different DL algorithms

Table 6.5: Mouse single-cell data sources used by different DL algorithms
Title	Algorithm	Cell_Origin	Number_of_Cells	Data_Sources	Refs
Brain cells from E18 mice	DCA, SAUCIE	Brain Cortex	1,306,127	10x: Single Cell Gene Expression Datasets
Midbrain and Dopaminergic Neuron Development	SAVER-X	Ventral Midbrain	1,907	GSE76381	(La Manno et al. 2016)
Mouse cell atlas	SAVER-X	NA	405,796	GSE108097	(Han et al. 2018)
neuron9k	DeepImpute	Cortex	9,128	10x: Single Cell Gene Expression Datasets
Mouse Visual Cortex	DeepImpute	Brain cortex	114,601	GSE102827	(Hrvatin et al. 2018)
murine epidermis	DeepImpute	Epidermis	1,422	GSE67602	(Joost et al. 2016)
myeloid progenitors	LATE, DESC, SAUCIE	Bone marrow	2,730	GSE72857	(Paul et al. 2015)
Cell-cycle	sclGAN	mESC	288	E-MTAB-2805	(Buettner et al. 2015)
A single-cell survey	NA	Intestine	7,721	GSE92332	(Haber et al. 2017)
Tabula Muris	iMAP	Mouse cells	>100K	NA
Baron-Mou-1	VASC	Pancreas	822	GSM2230761	(Baron et al. 2016)
Biase	scGMAI, VASC	Embryos/SMARTer	56	GSE57249	(Biase, Cao, and Zhong 2014)
Biase	scGMAI, VASC	Embryos/Fluidigm	90	GSE59892	(Biase, Cao, and Zhong 2014)
Deng	scGMAI, VASC	Liver	317	GSE45719	(Chu et al. 2016)
Klein	VASC, scDeepCluster, sclGAN	Stem Cells	2,717	GSE65525	(Klein et al. 2015)
Goolam	VASC	Mouse Embryo	124	E-METAB-3321	(Goolam et al. 2016)
Kolodziejczyk	VASC	mESC	704	E-MTAB-2600	(Kim et al. 2015)
Usoskin	VASC	Lumbar	864	GSE59739	(Usoskin et al. 2015)
Zeisel	VASC, scVI, SAUCIE, netAE	Cortex, hippocampus	3,005	GSE60361	(Zeisel et al. 2015)
Bladder cells	scDeepCluster	Bladder	12,884	GSE129845	(Baron et al. 2016)
HEMATO	scVI	Blood cell	>10,000	GSE89754	(Tusi et al. 2018)
retinal bipolar cells	scVI, scCapsNet, SAUCIE	retinal	~25,000	GSE81905	(Shekhar et al. 2016)
Embryo at 9 time points	LDAVE	embryos from E6.5 to E8.5	116,312	GSE87038	(Pijuan-Sala et al. 2019)
Embryo at 9 time points	LDAVE	embryos from E9.5 to E13.5	~2 millions	GSE119945	(Cao et al. 2019)
CyTOF	SAUCIE	Mouse thymus	200K, ~38 antibodies	Cytobank: 52942	(Setty et al. 2016)
Solo	Solo	Mouse kidneys	~8,000	GSE140262	(Bernstein et al. 2020)
Nestorowa	netAE	hematopoietic stem and progenitor cells	1,920	GSE81682	(Nestorowa et al. 2016)
small intestinal epithelium	scGen	Infected with Salmonella and worm H. polygyrus	1,957	GSE92332	(Haber et al. 2017)

D. Single-cell data derived from other species

Table 6.6: Single-cell data derived from other species
Title	Algorithm	Species	Tissue	Number_of_Cells	Data_Sources	Refs
Worm neuron cells\(^{1}\)	scDeepCluster	C. elegans	Neuron	4,186	GSE98561	(Joost et al. 2016)
Cross species, stimulation with LPS and dsRNA	scGen	Mouse, rat, rabbit, and pig	bone marrow-derived phagocyte	5,000 to 10,000 /species	13 accessions in ArrayExpress	(Kanehisa et al. 2017)
¹ Processed data is available at https://github.com/ttgump/scDeepCluster/tree/master/scRNA-seq%20data

E. Large single-cell data source used by various algorithms

Table 6.7: Large single-cell data source used by various algorithms
Title	Sources	Notes
10X Single-cell gene expression dataset	https://support.10xgenomics.com/single-cell-gene-expression/datasets	Contains large collection of scRNA-seq dataset generated using 10X system
Tabula Muris	https://tabula-muris.ds.czbiohub.org/	Compendium of scRNA-seq data from mouse
HCA	https://data.humancellatlas.org/	Human single-cell atlas
MCA	https://figshare.com/s/865e694ad06d5857db4b, or GSE108097	Mouse single-cell atlas
scQuery	https://scquery.cs.cmu.edu/	A web server cell type matching and key gene visualization. It is also a source for scRNA-seq collection (processed with common pipeline)
SeuratData	https://github.com/satijalab/seurat-data	List of datasets, including PBMC and human pancreatic islet cells
cytoBank	https://cytobank.org/	Community of big data cytometry

Table3

Evaluation metrics used in surveyed DL algorithms

Table 6.8: Evaluation metrics used in surveyed DL algorithms
EvaluationMethod	Equations	Explanation
Pseudobulk RNA-seq		Average of normalized (log2-transformed) scRNA-seq counts across cells is calculated and then correlation coefficient between the pseudobulk and the actual bulk RNA-seq profile of the same cell type is evaluated.
Mean squared error (MSE)	\(MSE=\frac{1}{n} \sum_{i=1}^{n}(x_{i}- \hat{x}_{i})^{2}\)	MSE assesses the quality of a predictor, or an estimator, from a collection of observed data \(x\), with \(\hat{x}\) being the predicted values.
Pearson correlation	\(\rho_{X,Y}=\frac{cov(X,Y)}{\sigma_{X}\sigma_{Y}}\)	where cov() is the covariance, \(\sigma X\) and \(\sigma Y\) are the standard deviation of \(X\) and \(Y\), respectively.
Spearman correlation	\(\rho_{s}=\rho_{r_{X},r_{Y}}=\frac{cov(r_X,r_Y)}{\sigma_{r_X}\sigma_{r_Y}}\)	The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables, where \(r_{X}\) is the rank of \(X\).
Entropy of accuracy, Hacc (Tran et al. 2020)	\(H_{acc}=-\frac{1}{M} \sum_{i=1}^{M} \sum_{j=1}^{N_i} p_i(x_j)logp_{i}(x_{j})\)	Measures the diversity of the ground-truth labels within each predicted cluster group. \(p_{i}(x_{j})\) (or \(q_{i}(x_{j})\)) are the proportions of cells in the \(j\)th ground-truth cluster (or predicted cluster) relative to the total number of cells in the \(i\)th predicted cluster (or ground-truth clusters), respectively.
Entropy of purity, Hpur (Tran et al. 2020)	\(H_{pur}=-\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M_i}q_i(x_j)logq_{i}(x_{j})\)	Measures the diversity of the predicted cluster labels within each ground-truth group
Entropy of mixing (Haghverdi et al. 2018)	\(E=\sum_{i=1}^{C}p_{i}\log(p_{i})\)	This metric evaluates the mixing of cells from different batches in the neighborhood of each cell. \(C\) is the number of batches, and \(p_{i}\) is the proportion of cells from batch \(i\) among \(N\) nearest cells.
Mutual Information (MI) (Strehland and Ghosh 2002)	\(MI(U,V)=\sum_{i}^{\|U\|}\sum_{j=1}^{\|V\|}P_{UV}(i,j)log(\frac{P_{UV}(i,j)}{P_{U}(i)P_{V}(j)})\)	where \(P_{U}(i)=\frac{\|U_{i}\|}{N}\) and \(P_{V}(j)=\frac{\|V_{j}\|}{N}\). Also, define the joint distribution probability is \(P_{UV}(i,j)=\frac{\|U_{i} \cap V_{j}\|}{N}\). The \(MI\) is a measure of mutual dependency between two cluster assignments \(U\) and \(V\).
Normalized Mutual Information (NMI) [165, BIB not found]	\(NMI(U,V)=\frac{2 \times MI(U,V)}{[H(U)+H(V)]}\)	where \(H(U)=\sum P_{U}(i)log(P_{U}(i)), H(V)=\sum P_{V}(i)log(P_V(i))\). The \(NMI\) is a normalization of the \(MI\) score between 0 and 1.
Kullback–Leibler (KL) divergence [166, BIB not found]	\(D_{KL}(P\|\|Q)=\sum_{x \in \chi}P(x)log(\frac{P(x)}{Q(x)})\)	where discrete probability distributions \(P\) and \(Q\) are defined on the same probability space \(<U+03C7>\). This relative entropy is the measure for directed divergence between two distributions.
Jaccard Index	\(J(U,V)=\frac{\lfloor U \cap V \rfloor}{\lfloor U \cup V \rfloor}\)	\(0 = J(U,V) = 1\). \(J = 1\) if clusters \(U\) and \(V\) are the same. If \(U\) are \(V\) are empty, \(J\) is defined as 1.
Fowlkes-Mallows Index for two clustering algorithms (FM)	\(FM=\sqrt{ \frac{TP}{TP + FP} \times \frac{TP}{TP+FN} }\)	TP as the number of pairs of points that are present in the same cluster in both \(U\) and \(V\); \(FP\) as the number of pairs of points that are present in the same cluster in \(U\) but not in \(V\); \(FN\) as the number of pairs of points that are present in the same cluster in \(V\) but not in \(U\); and TN as the number of pairs of points that are in different clusters in both U and V.
Rand index (RI)	\(RI=\frac{(a+b)}{\binom{n}{2}}\)	Measure of constancy between two clustering outcomes, where \(a\) (or \(b\)) is the count of pairs of cells in one cluster (or different clusters) from one clustering algorithm but also fall in the same cluster (or different clusters) from the other clustering algorithm.
Adjusted Rand index (ARI) (Hubert and Arabie 1985)	\(ARI=\frac{RI-E[RI]}{max(RI)-E[RI]}\)	ARI is a corrected-for-chance version of RI, where \(E[RI]\) is the expected Rand Index.
Silhouette index	\(s(i)=\frac{b(i)-a(i)}{max(a(i),b(i))}\)	where \(a(i)\) is the average dissimilarity of ith cell to all other cells in the same cluster, and \(b(i)\) is the average dissimilarity of ith cell to all cells in the closest cluster. The range of \(s(i)\) is [-1,1], with 1 to be well-clustered and -1 to be completely misclassified.
Maximum Mean Discrepancy (MMD) (Borgwardt et al. 2006)	\(MMD(F,p,q)=sup_{f \in F}\|\|\mu_{p}-\mu_{q}\|\|_{f}\)	\(MMD\) is a non-parametric distance between distributions based on the reproducing kernel Hilbert space, or, a distance-based measure between two distribution \(p\) and \(q\) based on the mean embeddings \(\mu_{p}\) and \(\mu_{q}\) in a reproducing kernel Hilbert space \(F\).
k-Nearest neighbor batch-effect test (kBET) (Buttner et al. 2019)	\(a_{n}^{k}=\sum_{l=1}^{L}\frac{(N_{nl}^{k} - k \bullet f_{l})^{2}}{k \bullet f_{l}} ~ X_{L-1}^{2}\)	Given a dataset of \(N\) cells from \(L\) batches with \(N_l\) denoting the number of cells in batch \(l\), \(N_{nl}^{k}\) is the number of cells from batch \(l\) in the \(k\)-nearest neighbors of cell \(n\), \(f_{l}\) is the global fraction of cells in batch \(l\), or \(f_{l}=\frac{N_l}{N}\), and \(X_{L-1}^{2}\) denotes the \(X^{2}\) distribution with \(L-1\) degrees of freedom. It uses a \(X^{2}\)-based test for random neighborhoods of fixed size to determine the significance (“well-mixed”).
Local Inverse Simpson’s Index (LISI) (Korsunsky et al. 2019)	\(\frac{1}{ \lambda(n)}=\frac{1}{\sum_{l=1}^{L}(p(l))^{2}}\)	This is the inverse Simpson’s Index in the \(k\)-nearest neighbors of cell \(n\) for all batches, where \(p(l)\) denotes the proportion of batch \(l\) in the \(k\)-nearest neighbors. The score reports the effective number of batches in the \(k\)-nearest neighbors of cell \(n\).
Homogeneity	\(HS=1-\frac{H(P(U\|V))}{H(P(U))}\)	where \(H()\) is the entropy, and \(U\) is the ground-truth assignment and \(V\) is the predicted assignment. The \(HS\) range from 0 to 1, where 1 indicates perfectly homogeneous labeling.
Completeness	\(CS=1-\frac{H(P(V\|U))}{H(P(V))}\)	Its values range from 0 to 1, where 1 indicates all members from a ground-truth label are assigned to a single cluster.
V-Measure [169, BIB not found]	\(V_{\beta}=\frac{(1+\beta)HS \times CS}{\beta HS + CS}\)	where \(\beta\) indicates the weight of \(HS\). \(V\)-Measure is symmetric, i.e. switching the true and predicted cluster labels does not change \(V\)-Measure.
Precision, recall	\(Precision = \frac{TP}{TP+FP}, recall=\frac{TP}{TP+FN}\)	TP: true positive, FP: false positive, FN, false negative.
Accuracy	\(Accuracy = \frac{TP+TN}{N}\)	N: all samples tested, TN: true negative
F1-score	\(F_{1}=\frac{2Precision \bullet Recall}{Precision+Recall}\)	A harmonic mean of precision and recall. It can be extended to \(F_\beta\) where \(\beta\) is a weight between precision and recall (similar to \(V\)-measure).
AUC, RUROC		Area Under Curve (grey area). Receiver operating characteristic (ROC) curve (red line). A similar measure can be performed on the Precision-Recall curve (PRC), or AUPRC. Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model (mostly for an imbalanced dataset).