A Computational Framework for Cancer Type Classification Using Single-Cell RNA Sequencing and Mathematical Analysis
Sudarshan Gogoi1, Soumen Bera2, Alexander B. Medvinsky3, Amit Chakraborty1
1Department of Mathematics, Sikkim University, Gangtok, Sikkim, India
2Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, United States
3Institute of Theoretical and Experimental Biophysics, Pushchino, Russia
Abstract. Cancer research has seen transformative advances with single-cell RNA sequencing, yet challenges persist in accurately classifying cancers with similar gene expression profiles. Addressing this, we developed an integrated computational framework leveraging single-cell RNA sequencing data from 10x Genomics Datasets, mathematical analysis, and Seurat-based clustering to classify and predict cancer types. A key methodological innovation involves the application of a mathematical approach that employs the Hausdorff distance matrix and norm analysis across a range of gene correlation thresholds. By generating stacked line plot patterns of the computed norms, the method captures distinct trends that differentiate between similar and dissimilar cancer types, thereby enabling effective classification and prediction. Key findings include robust classification accuracy for breast and lung cancers, derived from dynamic gene network analyses, while colorectal and ovarian cancers presented challenges linked to higher intratumoral heterogeneity. Our results revealed unique norm patterns reflective of distinct transcriptional architectures, including the dynamic immune landscapes in breast cancer and linear transcriptional progression in lung cancer. Validation on independent datasets underscored the method's reliability in categorizing unseen cancer data, providing statistical confidence for breast and lung cancer classifications. Beyond classification, the study advances understanding of cancer gene correlation networks, offering novel insights into transcriptional diversity and tumor microenvironment interactions. This framework bridges gaps in current methodologies, combining precision with scalability for diverse datasets. By integrating mathematical tools with single-cell RNA sequencing data, this study establishes a foundation for transformative applications in cancer diagnostics and treatment.
Key words: single-cell RNA, 10x Genomics, machine learning, cancer prediction, personalized diagnostics