BREAST TUMOR CLASSIFICATION USING RANDOM FOREST WITH FEATURE SELECTION AND GRIDSEARCHCV OPTIMIZATION

Authors

  • Priscilia Amanda Leza Informatika, Teknologi Informasi, Universitas Mercu Buana Yogyakarta
  • Mutaqin Akbar
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v8i3.534

Keywords:

Breast Tumor Classification, Random Forest, Feature Importance, Hyperparameter Optimization, GridSearchCV

Abstract

Breast tumor classification into benign and malignant categories is an important challenge in the medical field because diagnostic errors can lead to delayed treatment or unnecessary medical procedures. This study aims to analyze the performance of Random Forest and evaluate the effects of feature selection and GridSearchCV hyperparameter optimization on breast tumor classification. The study used the Wisconsin Breast Cancer Diagnostic Dataset, consisting of 569 samples with 30 numerical features extracted from Fine Needle Aspiration (FNA) examinations. Four sequential Random Forest model configurations were compared: baseline Random Forest, Random Forest with feature selection, Random Forest with GridSearchCV optimization, and the integration of feature selection with GridSearchCV. Feature selection was performed using feature importance scores with ROC-AUC-based cross-validation to determine the optimal feature subset. Model evaluation was conducted using accuracy, precision, recall, F1-score, ROC-AUC, confusion matrix, and train-test gap. The results showed that all models achieved the same accuracy of 97.37%, precision of 1.0000, recall of 0.9286, and F1-score of 0.9630. However, the integrated model achieved the highest ROC-AUC of 0.9977 with the smallest train-test gap of 0.0241 while reducing the number of features from 30 to 15. These findings indicate that integrating feature selection and GridSearchCV improves model stability, efficiency, and discriminative capability without reducing classification performance, addressing the limitation of prior studies that applied these techniques separately.

Downloads

Download data is not yet available.

References

Ali, N. M., Besar, R., & Aziz, N. A. A. (2023). A case study of microarray breast cancer classification using machine learning algorithms with grid search cross validation. Bulletin of Electrical Engineering and Informatics, 12(2), 1047–1054. https://doi.org/10.11591/eei.v12i2.4838

Armoogum, S., Dewi, D. A., Kezhilen, M., & Trinawarman, D. (2024). Breast cancer prediction using metrics-based classification. Journal of Applied Data Sciences, 5(3), 1508–1519. https://doi.org/10.47738/jads.v5i3.351

Bimo, A. A. (2024). Pemanfaatan Decision Tree pada Algoritma Random Forest untuk Klasifikasi Kanker Payudara.

Chazar, C., & Widhiaputra, B. E. (2020). Machine Learning Diagnosis Kanker Payudara Menggunakan Algoritma Support Vector Machine. INFORMASI (Jurnal Informatika dan Sistem Informasi), 12, 67–79. https://doi.org/10.37424/informasi.v12i1.48

Dalfi, M. A. H., Chaabouni, S., & Fakhfakh, A. (2023). Breast Cancer Detection Using Random Forest Supported by Feature Selection. International Journal of Intelligent Systems and Applications in Engineering, 2024(2s), 223–238.

Fauzi, A., Supriyadi, R., & Maulidah, N. (2020). Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest. Jurnal Infortech, 2(1), 96–101. https://doi.org/10.31294/infortech.v2i1.8079

Hulaifah Al Abrori, Z. Z., & Subhiyakto, E. R. (2025). Analisis Komparatif Akurasi Prediksi Kanker Payudara Menggunakan Algoritma Random Forest dan Logistic Regression. Jurnal Algoritma, 22(1), 300–311. https://doi.org/10.33364/algoritma/v.22-1.2164

Jamil, A. R., Hadi, J., & Munandar, I. (2024). Tumor Mammae. SCIENA, 3(6), 398. https://doi.org/10.56260/sciena.v3i6

Jinana, Y., Kusrini, & Kusnawi. (2025). Klasifikasi Random Forest Terhadap Diagnosa Penyakit Kanker Payudara Berdasarkan Status Keganasan. TEKNIMEDIA: Teknologi Informasi Dan Multimedia, 6(1), 129–134. https://doi.org/10.46764/teknimedia.v6i1.259

Kabir, S. S., Ahmed, M. S., Siddique, M. M., Ema, R. R., Rahman, M., & Md. Galib Syed. (2023). Breast Tumor Prediction And Feature Importance Score Finding Using Machine Learning Algorithms. Radioelectronic and Computer Systems, (4), 32–42. https://doi.org/10.32620/REKS.2023.4.03

Kementerian Kesehatan RI. (2018). Keputusan Menteri Kesehatan Republik Indonesia.

Kementerian Kesehatan RI. (2022, Februari 2). Kanker Payudara Paling Banyak di Indonesia, Kemenkes Targetkan Pemerataan Layanan Kesehatan. Kementerian Kesehatan RI. https://kemkes.go.id/id/kanker-payudaya-paling-banyak-di-indonesia-kemenkes-targetkan-pemerataan-layanan-kesehatan

Lestari, I. I., & Homaidi, A. (2024). Komparasi Algoritma Naive Bayes Dan Random Forest Pada Klasifikasi Kanker Payudara. Gudang Jurnal Multidisiplin Ilmu, 2(12), 778–785. https://doi.org/10.59435/gjmi.v2i12.1206

Lestari, N. S., Kuncoro, A. N., & Ngadikun. (2023). Deteksi Tumor Payudara (Breast Benign Diseases) Berdasar Interaksi Eritrosit Akibat Perubahan Ion-Ion Dalam Darah-Edta Menggunakan Spektrofotometer Uv-Vis. Journal for Physics Education and Applied Physics, 5(1), 47–61.

Magda, K., Inonu, O. Y., & Susanto, E. R. (2025). Komparasi Metode Machine Learning Untuk Diagnosis Penyakit Kanker Payudara. EXPERT: Jurnal Manajemen Sistem Informasi dan Teknologi, 15(1), 15. https://doi.org/10.36448/expert.v15i1.4313

Minnoor, M., & Baths, V. (2023). Diagnosis of Breast Cancer Using Random Forests. Procedia Computer Science, 218, 429–437. https://doi.org/10.1016/j.procs.2023.01.025

Misdiantoro, D., & Susanto, E. R. (2025). Optimasi Akurasi Prediksi Penyakit Kanker Payudara Menggunakan Metode Random Forest Optimization of Breast Cancer Disease Prediction Accuracy Using Random Forest Method. Jurnal Pendidikan dan Teknologi Indonesia (JPTI), 5(5), 1407–1416. https://doi.org/10.52436/1.jpti

Nasution, F. A., & Juledi, A. P. (2025). Penerapan Algoritma Random Forest untuk Klasifikasi Tingkat Keparahan Penyakit pada Data Rekam Medis. Journal of Computer Science and Information Systems, 6(3), 371–378.

Nguyen, C., Wang, Y., & Nguyen, H. N. (2013). Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. Journal of Biomedical Science and Engineering, 06(05), 551–560. https://doi.org/10.4236/jbise.2013.65070

Nur Fauzi, N. P., Khomsah, S., & Putra Wicaksono, A. D. (2025). Penerapan Feature Engineering dan Hyperparameter Tuning untuk Meningkatkan Akurasi Model Random Forest pada Klasifikasi Risiko Kredit. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(2), 251–262. https://doi.org/10.25126/jtiik.2025128472

Paepke, S., Metz, S., Brea Salvago, A., & Ohlinger, R. (2018). Benign Breast Tumours-Diagnosis and Management. Breast Care, 13(6), 403–412. https://doi.org/10.1159/000495919

Premalatha, K., Duraisamy, P. D., & Sivakumar, K. (2025). Machine learning framework for breast cancer detection with feature selection with L2 ridge regularization: Insights from multiple datasets. Journal of Translational Genetics and Genomics, 9(1), 11–34. https://doi.org/10.20517/jtgg.2024.82

Sarkaleh, M. K., Azgomi, H., & Kiani-Sarkaleh, A. (2026). Breast Cancer Classification Using Feature Selection via Improved Simulated Annealing and SVM Classifier. Diagnostics, 16(4). https://doi.org/10.3390/diagnostics16040637

Setiawan, A., Andri Armaginda Siregar, Setiawan, N., Jalaluddin Nasution, Naufal Dhiya Putra Dalimunthe, & Farhan Sardy Abdillah. (2026). Optimasi Performa Model SVM dan Random Forest untuk Klasifikasi Kanker Payudara Menggunakan Penyetelan Hyperparameter. Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI), 4(3), 2141–2149. https://doi.org/10.62712/juktisi.v4i3.789

Sholeh, M., Lestari, U., & Andayati, D. (2025). Hyperparameter Optimization Using Grid Search and Random Search to Improve the Performance of Prediction Models with Decision Trees. Jurnal Riset Multidisiplin dan Inovasi Teknologi, 3(03), 453–464. https://doi.org/10.59653/jimat.v3i03.2025

Suparna, K., & Kartika, S. L. M. K. (2022). Kanker Payudara Diagnostik, Faktor Risiko, dan Stadium. Ganesha Medicina Journal, Vol 2(1), 42–46. https://doi.org/10.23887/gm.v2i1.47032

Tjengharwidjaja, A., Dharma, B., Munenori, Y., & Emmanuel, M. (2024). Klasifikasi Pasien Terkena Breast Cancer Menggunakan Metode Machine Learning. Computatio: Journal of Computer Science and Information Systems, 8(1), 86–95. https://doi.org/10.24912/computatio.v8i1.15174

Yaqoob, A., Verma, N. K., Mir, M. A., Tejani, G. G., Eisa, N. H. B., Mamoun Hussien Osman, H., & Shah, M. A. (2025). SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis: A comparative study. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-95786-1

Zhu, J., Zhao, Z., Yin, B., Wu, C., Yin, C., & Chen, R. (2025). An integrated approach of feature selection and machine learning for early detection of breast cancer. Scientific Reports, 15, 1–12. https://doi.org/10.1038/s41598-025-97685-x

Zulkarnain, C. R., & Delyuzar. (2017). Perbandingan Antara Neoplasma Jinak Dan Ganas Pada Payudara Berdasarkan Pemeriksaan Fisik Diagnostik Dan Biopsi Aspirasi Jarum Halus. Ibnu Sina Biomedika, 1(2).

Attique, Z., & Khan, S. (2025). Quantitative Analysis of Breast Nuclei Morphology for Cancer Diagnosis Using Supervised Machine Learning.

Girdhar, A., Raju, K., & P N, S. (2023). Significance of Nuclear Morphometry in Breast Lesions: A Cross-Sectional Study. Cureus. https://doi.org/10.7759/cureus.39378

Downloads

Published

2026-06-16

How to Cite

Leza, P. A., & Akbar, M. (2026). BREAST TUMOR CLASSIFICATION USING RANDOM FOREST WITH FEATURE SELECTION AND GRIDSEARCHCV OPTIMIZATION. Jurnal Riset Informatika, 8(3), 397–406. https://doi.org/10.34288/jri.v8i3.534

Issue

Section

Articles