THE EFFECTIVENESS ANALYSIS OF RANDOM FOREST ALGORITHMS WITH SMOTE TECHNIQUE IN PREDICTING LUNG CANCER RISK
DOI:
https://doi.org/10.34288/jri.v4i2.159Keywords:
Lung Cancer, Python, Random Forest, SMOTEAbstract
Abstract
When compared with other types of cancer, most of the population with cancer die from lung cancer.A person needs to do a screening test through X-rays, CT scans, and MRI to detect the disease. However, before carrying out the process, the doctor will ordinarily investigate a medical history and physical examination first to study the symptoms and possible risk factors for lung cancer. The lung cancer data set has a class imbalance that affects the performance of the random forest algorithm in predicting the risk of lung cancer. This study aims to employ the SMOTE technique to the random forest algorithm to increase accuracy in predicting lung cancer risk. In this research, data processing and analysis use the Python programming language. The test results show an accuracy value of 88% with an AUC value of 0.93. When employing the random forest method to forecast lung cancer risk, the SMOTE technique is useful in dealing with class imbalances in the data set.
Downloads
References
American Cancer Society. (2022). Lung Cancer. American Cancer Society. https://www.cancer.org/cancer/lung-cancer.html
Ardiningtyas, Y. E., & Rosa, P. H. P. (2021). Analisis Balancing Data Untuk Meningkatkan Akurasi Dalam Klasifikasi. PROSIDING SNAST (2021), 24–28. https://ejournal.akprind.ac.id/index.php/prosidingsnast/article/view/3334
Arifiyanti, A. A., & Wahyuni, E. D. (2020). SMOTE : Metode Penyeimbang Kelas Pada Klasifikasi Data Mining. SCAN-Jurnal Teknologi Informasi Dan Komunikasi, 15(1), 34–39. http://www.ejournal.upnjatim.ac.id/index.php/scan/article/view/1850
Aripin, H. A. (2021). Outcome Prediction Untuk Penyakit Jantung Dengan Algoritma Artificial Neural Network. Jurnal Informatika Dan Komputer (INFOKOM), 9(1), 30–45. http://www.journal.piksi.ac.id/index.php/INFOKOM/article/view/485
Bulan, I. A., Ratnawati, H., & Wargasetia, T. L. (2017). Lung Cancer Patient Description in Immanuel Hospital Bandung from January 2013 to December 2014. Journal of Medicine and Health, 1(6), 517–524. http://114.7.153.31/index.php/jmh/article/view/548
Hendra, A., & Fitriyani, F. (2021). Analisis Sentimen Review Halodoc Menggunakan Nai ̈ve Bayes Classifier. JISKA (Jurnal Informatika Sunan Kalijaga), 6(2), 78–89. http://ejournal.uin-suka.ac.id/saintek/JISKA/article/view/2076
Indrawati, A. (2021). Penerapan Teknik Kombinasi Oversampling dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset. JIKO (Jurnal Informatika Dan Komputer), 4(1), 38–43. https://doi.org/10.33387/jiko
Kurnia, R., Rahmadewi, R., & Aini, F. (2016). Deteksi Dini Penyakit Paru Dengan Metoda Bayesian Berbasis Android. National Conference of Applied Engineering, Business and Information Technology, Politeknik Negeri Padang, 317–323.
Makaju, S., Prasad, P. W. C., Alsadoon, A., Singh, A. K., & Elchouemi, A. (2018). Lung Cancer Detection using CT Scan Images. Procedia Computer Science, 125(2009), 107–114. https://doi.org/10.1016/j.procs.2017.12.016
Ratnawati, L., & Sulistyaningrum, D. R. (2019). Penerapan Random Forest untuk Mengukur Tingkat Keparahan Penyakit pada Daun Apel. Jurnal Sains Dan Seni ITS, 8(2), A71–A77. http://ejurnal.its.ac.id/index.php/sains_seni/article/view/48517
Rattan, S., Kaur, S., Kansal, N., & Kaur, J. (2018). An optimized lung cancer classification system for computed tomography images. 2017 4th International Conference on Image Information Processing, ICIIP 2017, 2018-Janua, 15–20. https://doi.org/10.1109/ICIIP.2017.8313676
Religia, Y., Nugroho, A., & Hadikristanto, W. (2021). Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 187–192. http://www.jurnal.iaii.or.id/index.php/RESTI/article/view/2813
Sari, V. R., Firdausi, F., & Azhar, Y. (2020). Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes. EDUMATIC : Jurnal Pendidikan Informatika, 4(2), 1–9. https://doi.org/10.29408/edumatic.v4i2.2202
Sofia, R., & Tahlil, T. (2018). Pengalaman Pasien Kanker dalam Menghadapi Kemoterapi. Jurnal Ilmu Keperawatan, 6(2), 81–91. http://202.4.186.66/JIK/article/view/16111
Sulistiyono, M., Pristyanto, Y., Adi, S., & Gumelar, G. (2021). Implementasi Algoritma Synthetic Minority Over-Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi. SISTEMASI, 10(2), 445. https://doi.org/10.32520/stmsi.v10i2.1303
Syifa, R. A., Adi, K., Fisika, D., & Diponegoro, U. (2016). Analisis Tekstur Citra Mikroskopis Kanker Paru Menggunakan Metode Gray Level Co-Occurance Matrix (Glcm) Dan Tranformasi Wavelet Dengan Klasifikasi Naive Bayes. Youngster Physics Journal, 5(4), 457–462. https://ejournal3.undip.ac.id/index.php/bfd/article/view/14135
Syukron, M., Santoso, R., & Widiharih, T. (2020). Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data. Jurnal Gaussian, 9(3), 227–236. https://ejournal3.undip.ac.id/index.php/gaussian/article/view/28915
WHO. (2022). Cancer. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/cancer
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Ita Yulianti, Ami Rahmawati, Tati Mardiana

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Jurnal Riset Informatika has legal rules for accessing digital electronic articles uunder a Creative Commons Attribution-NonCommercial 4.0 International License . Articles published in Jurnal Riset Informatika, provide Open Access, for the purpose of scientific development, research, and libraries.