Seminar: Frontiers in Biostatistics – Single-Cell RNA-Seq Data Analysis via a Regularized Zero-Inflated Mixture Model Framework
Hear Jianhua Hu, PhD, professor of biostatistics and director of the Cancer Biostatistics Program at Columbia University, speak.
Abstract: Applications of single-cell RNA sequencing in various biomedical research areas have been blooming. This new technology provides unprecedented opportunities to study disease heterogeneity at the cellular level. However, unique characteristics of scRNA-seq data, including large dimensionality, high dropout rates, and possibly batch effects, bring great difficulty into the analysis of such data. Not appropriately addressing these issues obstructs true scientific discovery. Herein, we propose a unified Regularized Zero-inflated Mixture Model framework designed for scRNA-seq data (RZiMM-scRNA) to simultaneously detect cell subgroups and identify gene differential expression based on a developed importance score, accounting for both dropouts and batch effects. We conduct extensive empirical investigation to demonstrate the promise of RZiMM-scRNA in comparison to several popular methods, including K-means and Hierarchical clustering.