COMPARATIVE ANALYSIS OF DIMENSIONALITY REDUCTION FOR BREAST CANCER USING MACHINE LEARNING AND DEEP LEARNING

Authors

  • Fatimah Asmita Rani Universitas Nusa Mandiri
  • Duwi Lufita Marfiana Universitas Nusa Mandiri
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v7i3.375

Keywords:

Breast Cancer, High Dimensionality, Machine Learning, Deep Learning

Abstract

Breast cancer is one of the leading causes of death among women worldwide. Accurate early detection is essential to improve patient survival rates. Therefore, an efficient and optimal detection method is needed. This study presents a comparative analysis between machine learning and deep learning models integrated with various dimensionality reduction techniques to improve the accuracy of breast cancer classification. The dimensionality reduction methods evaluated include Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA). This study uses a dataset from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), which includes genetic and clinical data of breast cancer patients. Several classification algorithms are used in the evaluation, including Logistic Regression, Support Vector Machines (SVM), and Convolutional Neural Networks (CNN). Model performance is analyzed based on accuracy, precision, recall, and F1-score metrics. The results show that the LDA technique consistently produces better classification performance compared to other dimensionality reduction methods on various Machine Learning and Deep Learning models. The importance of choosing the right dimensionality reduction method in increasing the effectiveness of learning algorithms and more optimal, especially in the context of complex and high-dimensional medical data. The implications of this study can be used to develop a smarter decision support system in breast cancer diagnosis.

Downloads

Download data is not yet available.

References

Ahmad, G. N., Fatima, H., Shafiullah, Salah Saidi, A., & Imdadullah. (2022). Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques with and Without GridSearchCV. IEEE Access, 10(March), 80151–80173. https://doi.org/10.1109/ACCESS.2022.3165792

Anowar, F., Sadaoui, S., & Selim, B. (2021). Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Computer Science Review, 40, 100378. https://doi.org/10.1016/j.cosrev.2021.100378

Bielewicz, J. E., Kurzepa, J., Kamieniak, P., Daniluk, B., Szczepańska-Szerej, A., & Rejdak, K. (2020). Clinical and biochemical predictors of late-outcomein patients after ischemic stroke. Annals of Agricultural and Environmental Medicine, 27(2), 290–294. https://doi.org/10.26444/aaem/105927

Brandt, J., & Lanzén, E. (2020). A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification. 2021‏, 42.

Chugh, G., Kumar, S., & Singh, N. (2021). Survey on Machine Learning and Deep Learning Applications in Breast Cancer Diagnosis. Cognitive Computation, 13(6), 1451–1470. https://doi.org/10.1007/s12559-020-09813-6

Dafni, U., Tsourti, Z., & Alatsathianos, I. (2019). Breast cancer statistics in the european union: Incidence and survival across european countries. Breast Care, 14(6), 344–353. https://doi.org/10.1159/000503219

Dewi, C., & Chen, R. C. (2019). Random forest and support vector machine on features selection for regression analysis. International Journal of Innovative Computing, Information and Control, 15(6), 2027–2037. https://doi.org/10.24507/ijicic.15.06.2027

Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis. IEEE Access, 8, 150360–150376. https://doi.org/10.1109/ACCESS.2020.3016715

Fleuret, J. R., Ebrahimi, S., Ibarra-Castanedo, C., & Maldague, X. P. V. (2021). Independent component analysis applied on pulsed thermographic data for carbon fiber reinforced plastic inspection: A comparative study. Applied Sciences (Switzerland), 11(10). https://doi.org/10.3390/app11104377

Glucina,Matko; Lorencin,Ariana; Andelic,Nikola; Lorencin, I. (2023). applied sciences Algorithms and Class Balancing Techniques. Applied Sciences, 13(1061), 1–22.

Gülmez, B. (2023). Stock price prediction with optimized deep LSTM network with artificial rabbits optimization algorithm. Expert Systems with Applications, 227(April), 120346. https://doi.org/10.1016/j.eswa.2023.120346

Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques. SN Computer Science, 1(5), 1–14. https://doi.org/10.1007/s42979-020-00305-w

Luo, Z., Luo, B., Wang, P., Wu, J., Chen, C., Guo, Z., & Wang, Y. (2023). Predictive Model of Functional Exercise Compliance of Patients with Breast Cancer Based on Decision Tree. International Journal of Women’s Health, 15(March), 397–410. https://doi.org/10.2147/IJWH.S386405

Mahmoudi, M. R., Heydari, M. H., Qasem, S. N., Mosavi, A., & Band, S. S. (2021). Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries. Alexandria Engineering Journal, 60(1), 457–464. https://doi.org/10.1016/j.aej.2020.09.013

Mohammadi, M., Rashid, T. A., Karim, S. H. T., Aldalwie, A. H. M., Tho, Q. T., Bidaki, M., Rahmani, A. M., & Hosseinzadeh, M. (2021). A comprehensive survey and taxonomy of the SVM-based intrusion detection systems. Journal of Network and Computer Applications, 178(July 2020), 102983. https://doi.org/10.1016/j.jnca.2021.102983

Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. 2020 11th International Conference on Information and Communication Systems, ICICS 2020, 243–248. https://doi.org/10.1109/ICICS49469.2020.239556

Mohammed, S. A., Darrab, S., Noaman, S. A., & Saake, G. (2020). Analysis of breast cancer detection using different machine learning techniques. In Communications in Computer and Information Science: Vol. 1234 CCIS. Springer Singapore. https://doi.org/10.1007/978-981-15-7205-0_10

Muduli, D., Dash, R., & Majhi, B. (2022). Automated diagnosis of breast cancer using multi-modal datasets: A deep convolution neural network based approach. Biomedical Signal Processing and Control, 71(May), 102825. https://doi.org/10.1016/j.bspc.2021.102825

Naji, M. A., Filali, S. El, Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis. Procedia Computer Science, 191, 487–492. https://doi.org/10.1016/j.procs.2021.07.062

Orrù, P. F., Zoccheddu, A., Sassu, L., Mattia, C., Cozza, R., & Arena, S. (2020). Machine learning approach using MLP and SVM algorithms for the fault prediction of a centrifugal pump in the oil and gas industry. Sustainability (Switzerland), 12(11). https://doi.org/10.3390/su12114776

Oyedele, O. (2023). Determining the optimal number of folds to use in a K-fold cross-validation: A neural network classification experiment. Research in Mathematics, 10(1). https://doi.org/10.1080/27684830.2023.2201015

Pereira, B., Chin, S. F., Rueda, O. M., Vollan, H. K. M., Provenzano, E., Bardwell, H. A., Pugh, M., Jones, L., Russell, R., Sammut, S. J., Tsui, D. W. Y., Liu, B., Dawson, S. J., Abraham, J., Northen, H., Peden, J. F., Mukherjee, A., Turashvili, G., Green, A. R., … Caldas, C. (2016). The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nature Communications, 7(May). https://doi.org/10.1038/ncomms11479

Qu, S., Zhou, M., Jiao, S., Zhang, Z., Xue, K., Long, J., Zha, F., Chen, Y., Li, J., Yang, Q., & Wang, Y. (2022). Optimizing acute stroke outcome prediction models: Comparison of generalized regression neural networks and logistic regressions. PLoS ONE, 17(5 May), 1–16. https://doi.org/10.1371/journal.pone.0267747

Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308–6325. https://doi.org/10.1109/JSTARS.2020.3026724

Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), 1–11. https://doi.org/10.1038/s41598-022-10358-x

Wang, S., Ma, L., & Wang, J. (2023). Fault Diagnosis Method Based on CND-SMOTE and BA-SVM Algorithm. Journal of Physics: Conference Series, 2493(1). https://doi.org/10.1088/1742-6596/2493/1/012008

Watanobe, Y., Rahman, M. M., Amin, M. F. I., & Kabir, R. (2023). Identifying algorithm in program code based on structural features using CNN classification model. Applied Intelligence, 53(10), 12210–12236. https://doi.org/10.1007/s10489-022-04078-y

Wu, J., & Hicks, C. (2021). Breast cancer type classification using machine learning. Journal of Personalized Medicine, 11(2), 1–12. https://doi.org/10.3390/jpm11020061

Zaidi, A., & Al Luhayb, A. S. M. (2023). Two Statistical Approaches to Justify the Use of the Logistic Function in Binary Logistic Regression. Mathematical Problems in Engineering, 2023(1). https://doi.org/10.1155/2023/5525675

Downloads

Published

2025-06-12

How to Cite

Fatimah Asmita Rani, & Lufita Marfiana, D. (2025). COMPARATIVE ANALYSIS OF DIMENSIONALITY REDUCTION FOR BREAST CANCER USING MACHINE LEARNING AND DEEP LEARNING. Jurnal Riset Informatika, 7(3), 156–169. https://doi.org/10.34288/jri.v7i3.375

Issue

Section

Articles