COMPARATIVE ANALYSIS OF DIMENSIONALITY REDUCTION FOR BREAST CANCER USING MACHINE LEARNING AND DEEP LEARNING
DOI:
https://doi.org/10.34288/jri.v7i3.375Keywords:
Breast Cancer, High Dimensionality, Machine Learning, Deep LearningAbstract
Breast cancer is one of the leading causes of death among women worldwide. Accurate early detection is essential to improve patient survival rates. Therefore, an efficient and optimal detection method is needed. This study presents a comparative analysis between machine learning and deep learning models integrated with various dimensionality reduction techniques to improve the accuracy of breast cancer classification. The dimensionality reduction methods evaluated include Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA). This study uses a dataset from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), which includes genetic and clinical data of breast cancer patients. Several classification algorithms are used in the evaluation, including Logistic Regression, Support Vector Machines (SVM), and Convolutional Neural Networks (CNN). Model performance is analyzed based on accuracy, precision, recall, and F1-score metrics. The results show that the LDA technique consistently produces better classification performance compared to other dimensionality reduction methods on various Machine Learning and Deep Learning models. The importance of choosing the right dimensionality reduction method in increasing the effectiveness of learning algorithms and more optimal, especially in the context of complex and high-dimensional medical data. The implications of this study can be used to develop a smarter decision support system in breast cancer diagnosis.
Downloads
References
Ahmad, G. N., Fatima, H., Shafiullah, Salah Saidi, A., & Imdadullah. (2022). Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques with and Without GridSearchCV. IEEE Access, 10(March), 80151–80173. https://doi.org/10.1109/ACCESS.2022.3165792
Anowar, F., Sadaoui, S., & Selim, B. (2021). Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Computer Science Review, 40, 100378. https://doi.org/10.1016/j.cosrev.2021.100378
Bielewicz, J. E., Kurzepa, J., Kamieniak, P., Daniluk, B., Szczepańska-Szerej, A., & Rejdak, K. (2020). Clinical and biochemical predictors of late-outcomein patients after ischemic stroke. Annals of Agricultural and Environmental Medicine, 27(2), 290–294. https://doi.org/10.26444/aaem/105927
Brandt, J., & Lanzén, E. (2020). A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification. 2021, 42.
Chugh, G., Kumar, S., & Singh, N. (2021). Survey on Machine Learning and Deep Learning Applications in Breast Cancer Diagnosis. Cognitive Computation, 13(6), 1451–1470. https://doi.org/10.1007/s12559-020-09813-6
Dafni, U., Tsourti, Z., & Alatsathianos, I. (2019). Breast cancer statistics in the european union: Incidence and survival across european countries. Breast Care, 14(6), 344–353. https://doi.org/10.1159/000503219
Dewi, C., & Chen, R. C. (2019). Random forest and support vector machine on features selection for regression analysis. International Journal of Innovative Computing, Information and Control, 15(6), 2027–2037. https://doi.org/10.24507/ijicic.15.06.2027
Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis. IEEE Access, 8, 150360–150376. https://doi.org/10.1109/ACCESS.2020.3016715
Fleuret, J. R., Ebrahimi, S., Ibarra-Castanedo, C., & Maldague, X. P. V. (2021). Independent component analysis applied on pulsed thermographic data for carbon fiber reinforced plastic inspection: A comparative study. Applied Sciences (Switzerland), 11(10). https://doi.org/10.3390/app11104377
Glucina,Matko; Lorencin,Ariana; Andelic,Nikola; Lorencin, I. (2023). applied sciences Algorithms and Class Balancing Techniques. Applied Sciences, 13(1061), 1–22.
Gülmez, B. (2023). Stock price prediction with optimized deep LSTM network with artificial rabbits optimization algorithm. Expert Systems with Applications, 227(April), 120346. https://doi.org/10.1016/j.eswa.2023.120346
Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques. SN Computer Science, 1(5), 1–14. https://doi.org/10.1007/s42979-020-00305-w
Luo, Z., Luo, B., Wang, P., Wu, J., Chen, C., Guo, Z., & Wang, Y. (2023). Predictive Model of Functional Exercise Compliance of Patients with Breast Cancer Based on Decision Tree. International Journal of Women’s Health, 15(March), 397–410. https://doi.org/10.2147/IJWH.S386405
Mahmoudi, M. R., Heydari, M. H., Qasem, S. N., Mosavi, A., & Band, S. S. (2021). Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries. Alexandria Engineering Journal, 60(1), 457–464. https://doi.org/10.1016/j.aej.2020.09.013
Mohammadi, M., Rashid, T. A., Karim, S. H. T., Aldalwie, A. H. M., Tho, Q. T., Bidaki, M., Rahmani, A. M., & Hosseinzadeh, M. (2021). A comprehensive survey and taxonomy of the SVM-based intrusion detection systems. Journal of Network and Computer Applications, 178(July 2020), 102983. https://doi.org/10.1016/j.jnca.2021.102983
Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. 2020 11th International Conference on Information and Communication Systems, ICICS 2020, 243–248. https://doi.org/10.1109/ICICS49469.2020.239556
Mohammed, S. A., Darrab, S., Noaman, S. A., & Saake, G. (2020). Analysis of breast cancer detection using different machine learning techniques. In Communications in Computer and Information Science: Vol. 1234 CCIS. Springer Singapore. https://doi.org/10.1007/978-981-15-7205-0_10
Muduli, D., Dash, R., & Majhi, B. (2022). Automated diagnosis of breast cancer using multi-modal datasets: A deep convolution neural network based approach. Biomedical Signal Processing and Control, 71(May), 102825. https://doi.org/10.1016/j.bspc.2021.102825
Naji, M. A., Filali, S. El, Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis. Procedia Computer Science, 191, 487–492. https://doi.org/10.1016/j.procs.2021.07.062
Orrù, P. F., Zoccheddu, A., Sassu, L., Mattia, C., Cozza, R., & Arena, S. (2020). Machine learning approach using MLP and SVM algorithms for the fault prediction of a centrifugal pump in the oil and gas industry. Sustainability (Switzerland), 12(11). https://doi.org/10.3390/su12114776
Oyedele, O. (2023). Determining the optimal number of folds to use in a K-fold cross-validation: A neural network classification experiment. Research in Mathematics, 10(1). https://doi.org/10.1080/27684830.2023.2201015
Pereira, B., Chin, S. F., Rueda, O. M., Vollan, H. K. M., Provenzano, E., Bardwell, H. A., Pugh, M., Jones, L., Russell, R., Sammut, S. J., Tsui, D. W. Y., Liu, B., Dawson, S. J., Abraham, J., Northen, H., Peden, J. F., Mukherjee, A., Turashvili, G., Green, A. R., … Caldas, C. (2016). The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nature Communications, 7(May). https://doi.org/10.1038/ncomms11479
Qu, S., Zhou, M., Jiao, S., Zhang, Z., Xue, K., Long, J., Zha, F., Chen, Y., Li, J., Yang, Q., & Wang, Y. (2022). Optimizing acute stroke outcome prediction models: Comparison of generalized regression neural networks and logistic regressions. PLoS ONE, 17(5 May), 1–16. https://doi.org/10.1371/journal.pone.0267747
Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308–6325. https://doi.org/10.1109/JSTARS.2020.3026724
Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), 1–11. https://doi.org/10.1038/s41598-022-10358-x
Wang, S., Ma, L., & Wang, J. (2023). Fault Diagnosis Method Based on CND-SMOTE and BA-SVM Algorithm. Journal of Physics: Conference Series, 2493(1). https://doi.org/10.1088/1742-6596/2493/1/012008
Watanobe, Y., Rahman, M. M., Amin, M. F. I., & Kabir, R. (2023). Identifying algorithm in program code based on structural features using CNN classification model. Applied Intelligence, 53(10), 12210–12236. https://doi.org/10.1007/s10489-022-04078-y
Wu, J., & Hicks, C. (2021). Breast cancer type classification using machine learning. Journal of Personalized Medicine, 11(2), 1–12. https://doi.org/10.3390/jpm11020061
Zaidi, A., & Al Luhayb, A. S. M. (2023). Two Statistical Approaches to Justify the Use of the Logistic Function in Binary Logistic Regression. Mathematical Problems in Engineering, 2023(1). https://doi.org/10.1155/2023/5525675
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fatimah Asmita Rani, Duwi Lufita Marfiana

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Jurnal Riset Informatika has legal rules for accessing digital electronic articles uunder a Creative Commons Attribution-NonCommercial 4.0 International License . Articles published in Jurnal Riset Informatika, provide Open Access, for the purpose of scientific development, research, and libraries.










