Optimizing Deep Learning with Dimensionality Reduction for Analyzing the CuMiDa Brain Cancer Gene Expression Dataset

Authors

  • Duwi Lufita Marfiana Nusa Mandiri
  • Fatimah Asmita Rani Universitas Nusa Mandiri
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v6i4.350

Keywords:

Dimensionality Reduction, CuMiDa, Brain Cancer, PCA, TruncatedSVD

Abstract

In the digital era, machine learning and deep learning have become indispensable tools for bioinformatics, particularly in analyzing high-dimensional gene expression data for cancer diagnosis and classification. This study leverages the CuMiDa brain cancer dataset, a curated microarray database with 54,676 genes and 130 samples, to evaluate the effectiveness of deep learning models integrated with dimensionality reduction techniques. Principal Component Analysis (PCA) and Truncated Singular Value Decomposition (TruncatedSVD) were employed to address the challenges of high-dimensional data, reducing noise and computational complexity. Three deep learning models—DNN, MLP, and TabNet—were implemented with various optimizers, including ADAM, RMSprop, and SGD. Results showed that TruncatedSVD outperformed PCA in minimizing loss, especially for MLP with LBFGS optimizers, achieving near-zero loss. TabNet demonstrated the highest classification accuracy (96%) with ADAM and RMSprop. Conversely, SGD exhibited suboptimal performance across models. These findings highlight the critical role of dimensionality reduction and optimizer selection in enhancing the efficiency and accuracy of deep learning models for cancer classification. This research provides a robust framework for improving diagnostic tools in computational oncology.

Downloads

Download data is not yet available.

References

Bagiroz, B., Doruk, E., & Yildiz, O. (2020). Machine learning in Bioinformatics: gene expression and microarray studies. 2020 Medical Technologies Congress (TIPTEKNO). https://doi.org/10.1109/tiptekno50054.2020.9299285

Tabassum, N., Kamal, M. a. S., Akhand, M. a. H., & Yamada, K. (2024). Cancer Classification from Gene Expression Using Ensemble Learning with an Influential Feature Selection Technique. BioMedInformatics, 4(2), 1275–1288. https://doi.org/10.3390/biomedinformatics4020070

Deng, X., & Xu, Y. (2019). Cancer Classification Using Microarray Data By DPCAForest. 2019 IEEE 31st International Conference on Tools With Artificial Intelligence (ICTAI), 1081–1087. https://doi.org/10.1109/ictai.2019.00151

Chebli, H., Mashhadieh, Z., Ali, M. A., Madi, M. K., & Kassem, I. R. (2023). Unlocking the potential of DNA microarray for accurate cancer diagnosis with deep learning. 2023 Seventh International Conference on Advances in Biomedical Engineering (ICABME), 251–256. https://doi.org/10.1109/icabme59496.2023.10293017

Younis, A., Qiang, L., Nyatega, C. O., Adamu, M. J., & Kawuwa, H. B. (2022). Brain tumor analysis using deep learning and VGG-16 ensembling learning approaches. Applied Sciences, 12(14), 7282. https://doi.org/10.3390/app12147282

Tabares-Soto, R., Orozco-Arias, S., Romero-Cano, V., Bucheli, V. S., Rodríguez-Sotelo, J. L., & Jiménez-Varón, C. F. (2020). A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Computer Science, 6, e270. https://doi.org/10.7717/peerj-cs.270

Das, A., Neelima, N., Deepa, K., & Özer, T. (2024). Gene selection based cancer classification with adaptive optimization using deep learning architecture. IEEE Access, 12, 62234–62255. https://doi.org/10.1109/access.2024.3392633

Basavegowda, H. S., & Dagnew, G. (2019). Deep learning approach for microarray cancer data classification. CAAI Transactions on Intelligence Technology, 5(1), 22–33. https://doi.org/10.1049/trit.2019.0028

Ilyas, M., Aamir, K. M., Manzoor, S., & Deriche, M. (2023). Linear programming based computational technique for leukemia classification using gene expression profile. PLoS ONE, 18(10), e0292172. https://doi.org/10.1371/journal.pone.0292172

Feltes, B. C., Chandelier, E. B., Grisci, B. I., & Dorn, M. (2019). CUMIDA: an extensively curated microarray database for benchmarking and testing of machine learning approaches in cancer research. Journal of Computational Biology, 26(4), 376–386. https://doi.org/10.1089/cmb.2018.0238

Zeger, I., & Sisul, G. (2021). Introduction to deep learning possibilities in communication systems. International Symposium ELMAR, 21–24. https://doi.org/10.1109/elmar52657.2021.9550825

Polamuri, S. R., Kumbhkar, M., & Daniel, D. A. P. (2022). Introduction to Deep Learning (1st Edition). AGPH Books (Academic Guru Publishing House). ISBN: 978-93-94339-21-7.

Gupta, S., Gupta, M. K., Shabaz, M., & Sharma, A. (2022). Deep learning techniques for cancer classification using microarray gene expression data. Frontiers in Physiology, 13. https://doi.org/10.3389/fphys.2022.952709

Ali, W., & Saeed, F. (2023). Hybrid filter and Genetic Algorithm-Based feature selection for improving cancer classification in High-Dimensional Microarray data. Processes, 11(2), 562. https://doi.org/10.3390/pr11020562

Nagra, A. A., Khan, A. H., Abubakar, M., Faheem, M., Rasool, A., Masood, K., & Hussain, M. (2024). A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-68744-6

Lederer, J. (2021). Activation Functions in Artificial Neural Networks: A Systematic Overview. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2101.09957

Sharma, S., Sharma, S., & Athaiya, A. (2020). ACTIVATION FUNCTIONS IN NEURAL NETWORKS. International Journal of Engineering Applied Sciences and Technology, 04(12), 310–316. https://doi.org/10.33564/ijeast.2020.v04i12.054

Nasiri, H., & Alavi, S. A. (2022). A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images. Computational Intelligence and Neuroscience, 2022, 1–11. https://doi.org/10.1155/2022/4694567

Singh, D., & Singh, B. (2019). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Brandt, J., & Lanzén, E. (2020). A comparative review of SMOTE and ADASYN in imbalanced data classification. https://www.diva-portal.org/smash/get/diva2:1519153/FULLTEXT01.pdf

Thi, H. D., Manh, K. H., Anh, V. T., Quynh, T. P. T., & Viet, T. N. (2023). Dimensionality Reduction with Truncated Singular Value Decomposition and K-Nearest Neighbors Regression for Indoor Localization. International Journal of Advanced Computer Science and Applications, 14(10). https://doi.org/10.14569/ijacsa.2023.0141034

Downloads

Published

2024-09-15

How to Cite

Duwi Lufita Marfiana, & Asmita Rani, F. (2024). Optimizing Deep Learning with Dimensionality Reduction for Analyzing the CuMiDa Brain Cancer Gene Expression Dataset. Jurnal Riset Informatika, 6(4), 237–246. https://doi.org/10.34288/jri.v6i4.350