Optimizing Deep Learning with Dimensionality Reduction for Analyzing the CuMiDa Brain Cancer Gene Expression Dataset
DOI:
https://doi.org/10.34288/jri.v6i4.350Keywords:
Dimensionality Reduction, CuMiDa, Brain Cancer, PCA, TruncatedSVDAbstract
In the digital era, machine learning and deep learning have become indispensable tools for bioinformatics, particularly in analyzing high-dimensional gene expression data for cancer diagnosis and classification. This study leverages the CuMiDa brain cancer dataset, a curated microarray database with 54,676 genes and 130 samples, to evaluate the effectiveness of deep learning models integrated with dimensionality reduction techniques. Principal Component Analysis (PCA) and Truncated Singular Value Decomposition (TruncatedSVD) were employed to address the challenges of high-dimensional data, reducing noise and computational complexity. Three deep learning models—DNN, MLP, and TabNet—were implemented with various optimizers, including ADAM, RMSprop, and SGD. Results showed that TruncatedSVD outperformed PCA in minimizing loss, especially for MLP with LBFGS optimizers, achieving near-zero loss. TabNet demonstrated the highest classification accuracy (96%) with ADAM and RMSprop. Conversely, SGD exhibited suboptimal performance across models. These findings highlight the critical role of dimensionality reduction and optimizer selection in enhancing the efficiency and accuracy of deep learning models for cancer classification. This research provides a robust framework for improving diagnostic tools in computational oncology.
Downloads
References
Bagiroz, B., Doruk, E., & Yildiz, O. (2020). Machine learning in Bioinformatics: gene expression and microarray studies. 2020 Medical Technologies Congress (TIPTEKNO). https://doi.org/10.1109/tiptekno50054.2020.9299285
Tabassum, N., Kamal, M. a. S., Akhand, M. a. H., & Yamada, K. (2024). Cancer Classification from Gene Expression Using Ensemble Learning with an Influential Feature Selection Technique. BioMedInformatics, 4(2), 1275–1288. https://doi.org/10.3390/biomedinformatics4020070
Deng, X., & Xu, Y. (2019). Cancer Classification Using Microarray Data By DPCAForest. 2019 IEEE 31st International Conference on Tools With Artificial Intelligence (ICTAI), 1081–1087. https://doi.org/10.1109/ictai.2019.00151
Chebli, H., Mashhadieh, Z., Ali, M. A., Madi, M. K., & Kassem, I. R. (2023). Unlocking the potential of DNA microarray for accurate cancer diagnosis with deep learning. 2023 Seventh International Conference on Advances in Biomedical Engineering (ICABME), 251–256. https://doi.org/10.1109/icabme59496.2023.10293017
Younis, A., Qiang, L., Nyatega, C. O., Adamu, M. J., & Kawuwa, H. B. (2022). Brain tumor analysis using deep learning and VGG-16 ensembling learning approaches. Applied Sciences, 12(14), 7282. https://doi.org/10.3390/app12147282
Tabares-Soto, R., Orozco-Arias, S., Romero-Cano, V., Bucheli, V. S., Rodríguez-Sotelo, J. L., & Jiménez-Varón, C. F. (2020). A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Computer Science, 6, e270. https://doi.org/10.7717/peerj-cs.270
Das, A., Neelima, N., Deepa, K., & Özer, T. (2024). Gene selection based cancer classification with adaptive optimization using deep learning architecture. IEEE Access, 12, 62234–62255. https://doi.org/10.1109/access.2024.3392633
Basavegowda, H. S., & Dagnew, G. (2019). Deep learning approach for microarray cancer data classification. CAAI Transactions on Intelligence Technology, 5(1), 22–33. https://doi.org/10.1049/trit.2019.0028
Ilyas, M., Aamir, K. M., Manzoor, S., & Deriche, M. (2023). Linear programming based computational technique for leukemia classification using gene expression profile. PLoS ONE, 18(10), e0292172. https://doi.org/10.1371/journal.pone.0292172
Feltes, B. C., Chandelier, E. B., Grisci, B. I., & Dorn, M. (2019). CUMIDA: an extensively curated microarray database for benchmarking and testing of machine learning approaches in cancer research. Journal of Computational Biology, 26(4), 376–386. https://doi.org/10.1089/cmb.2018.0238
Zeger, I., & Sisul, G. (2021). Introduction to deep learning possibilities in communication systems. International Symposium ELMAR, 21–24. https://doi.org/10.1109/elmar52657.2021.9550825
Polamuri, S. R., Kumbhkar, M., & Daniel, D. A. P. (2022). Introduction to Deep Learning (1st Edition). AGPH Books (Academic Guru Publishing House). ISBN: 978-93-94339-21-7.
Gupta, S., Gupta, M. K., Shabaz, M., & Sharma, A. (2022). Deep learning techniques for cancer classification using microarray gene expression data. Frontiers in Physiology, 13. https://doi.org/10.3389/fphys.2022.952709
Ali, W., & Saeed, F. (2023). Hybrid filter and Genetic Algorithm-Based feature selection for improving cancer classification in High-Dimensional Microarray data. Processes, 11(2), 562. https://doi.org/10.3390/pr11020562
Nagra, A. A., Khan, A. H., Abubakar, M., Faheem, M., Rasool, A., Masood, K., & Hussain, M. (2024). A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-68744-6
Lederer, J. (2021). Activation Functions in Artificial Neural Networks: A Systematic Overview. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2101.09957
Sharma, S., Sharma, S., & Athaiya, A. (2020). ACTIVATION FUNCTIONS IN NEURAL NETWORKS. International Journal of Engineering Applied Sciences and Technology, 04(12), 310–316. https://doi.org/10.33564/ijeast.2020.v04i12.054
Nasiri, H., & Alavi, S. A. (2022). A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images. Computational Intelligence and Neuroscience, 2022, 1–11. https://doi.org/10.1155/2022/4694567
Singh, D., & Singh, B. (2019). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524
Brandt, J., & Lanzén, E. (2020). A comparative review of SMOTE and ADASYN in imbalanced data classification. https://www.diva-portal.org/smash/get/diva2:1519153/FULLTEXT01.pdf
Thi, H. D., Manh, K. H., Anh, V. T., Quynh, T. P. T., & Viet, T. N. (2023). Dimensionality Reduction with Truncated Singular Value Decomposition and K-Nearest Neighbors Regression for Indoor Localization. International Journal of Advanced Computer Science and Applications, 14(10). https://doi.org/10.14569/ijacsa.2023.0141034
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Duwi Lufita Marfiana, Fatimah Asmita Rani

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Jurnal Riset Informatika has legal rules for accessing digital electronic articles uunder a Creative Commons Attribution-NonCommercial 4.0 International License . Articles published in Jurnal Riset Informatika, provide Open Access, for the purpose of scientific development, research, and libraries.