Foundation Models for Analyzing Single-Cell RNA Sequence data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Single-cell RNA sequencing (scRNA-seq) measures gene expression in individual cells, offering deep insight into cellular heterogeneity, development, and disease. Transformer-based foundation models have become central to single-cell RNA-sequencing analysis, yet most rely on uniform random masking during pretraining, a strategy misaligned with the sparsity, heterogeneity, and zero inflation characteristic of scRNA-seq data. To assess how these models behave under realistic biological variation, we first perform a comprehensive evaluation of four widely used single-cell foundation models (Geneformer, scBERT, scFoundation, and scGPT) across three diverse datasets. This benchmarking reveals substantial variability in model performance, including systematic weaknesses on rare cell populations and degraded accuracy in clinically challenging conditions. Motivated by the broader limitations of random masking in Foundation models, we introduce Multinomial Attention Masking (MAM), a biologically informed masking strategy that leverages trainable latent representations and cross-attention to identify informative gene positions during pretraining. Across all datasets, models pretrained with MAM consistently achieve higher downstream cell-type classification accuracy than those trained with uniform masking and, in several cases, outperform the original pretrained backbones. Biological validation further demonstrates that MAM preferentially selects highly expressed and functionally meaningful genes, indicating that its improvements stem from capturing biologically relevant structure rather than from increased algorithmic complexity. This work improves the reliability and utility of single-cell foundation models for researchers and clinicians alike.