COMPARATIVE MACHINE LEARNING ALGORITHMS FOR YOUTUBE SENTIMENT ANALYSIS ON DPR DEMONSTRATION 2025 USING LEXICON

Authors

(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v8i1.470

Keywords:

Sentiment Analysis, Machine Learning, Lexicon-Based, YouTube Comments, DPR Demonstration

Abstract

The high volume of public comments on YouTube regarding the DPR Demonstrasion August 2025, which reached 43,910 raw data, presents a significant challenge in conducting efficient sentiment analysis. Time and cost limitations in manual labeling for large-scale datasets are a major obstacle in the development of predictive models. This study aims to address this problem by proposing a hybrid approach that integrates Lexicon-Based auto-labeling with a comparative evaluation of five Machine Learning algorithms. The research methodology included a text preprocessing stage that generated 40,097 unique comments, feature extraction using TF-IDF, and data sharing with an 80:20 ratio. The performance of the Support Vector Machine algorithm was comprehensively compared to Random Forest, Decision Tree, K-Nearest Neighbors, and Naive Bayes. The results of the experiment showed that the SVM model recorded the most superior performance with an accuracy of 96.5% and a weighted F1-Score of 0.966. This score significantly outperformed other benchmarking algorithms, where Random Forest came in second place with 89.2% accuracy, followed by Decision Tree at 85.6%, KNN at 84.6%, and Naive Bayes at the lowest with 84.0%. These findings validate that the integration of Lexicon-Based labeling with SVM classification is a highly accurate, robust, and efficient solution for handling sentiment analysis on large-scale social media data in Indonesia.

Downloads

Download data is not yet available.

Author Biographies

Ahmad Abdul Chamid, Universitas Muria Kudus

Dosen Program Studi Teknik Informatika, Fakultas Teknik, Universitas Muria Kudus.

Ahmad Jazuli, Universitas Muria Kudus

Dosen Program Studi Teknik Informatika, Fakultas Teknik, Universitas Muria Kudus.

References

Adi, S. I. R., Bakkara, B., Zega, K. A., Vielita, F. N., & Rakhmawati, N. A. (2024). Analisis Sentimen Masyarakat Terhadap Progress Ikn Menggunakan Model Decision Tree. JIKA (Jurnal Informatika), 8(1), 57. https://doi.org/10.31000/jika.v8i1.9803

Adriana, N. M. T. O., Suarjaya, I. M. A. D., & Githa, D. P. (2023). Analisis sentimen publik terhadap aksi demonstrasi di Indonesia menggunakan Support Vector Machine dan Random Forest. DECODE: Jurnal Pendidikan Teknologi Informasi, 3(2), 257–267. https://doi.org/http://dx.doi.org/10.51454/decode.v3i2.187

Ardiansyah, A., Agustina, C., Maryani, I., & Pribadi, D. (2025). Analisis Sentimen pada Komentar YouTube terkait Pembahasan eSIM Menggunakan Metode Naive Bayes dan Random Forest. Indonesian Journal on Software Engineering (IJSE), 11(1 JUNI), 7–14. https://doi.org/10.31294/ijse.v11i1.26180

Atinna, A. N., & Akbar, M. (2025). Analisis sentimen masyarakat terhadap kebijakan Undang-Undang Tentara Nasional Indonesia (UU TNI) menggunakan Support Vector Machine. Jurnal Komputer, Informasi Dan Teknologi, 5(1), 1–14. https://doi.org/https://doi.org/10.53697/jkomitek.v5i1.2603

Chamid, A. A., Nindyasari, R., Azizah, N., & Hariyadi, A. (2025). Analysis of Public Opinion on The Governor Candidate Debate Using LDA and IndoBERT. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control. https://doi.org/10.22219/kinetik.v10i3.2221

Chamid, A. A., Nindyasari, R., & Ghozali, M. I. (2025). Comparative Analysis of Machine Learning Algorithms for Predicting Patient Admission in Emergency Departments Using EHR Data. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 9(2), 185–194. https://doi.org/10.29207/resti.v9i2.6188

Chamid, A. A., Widowati, & Kusumaningrum, R. (2022). Graph-Based Semi-Supervised Deep Learning for Indonesian Aspect-Based Sentiment Analysis. Big Data and Cognitive Computing, 7(1), 5. https://doi.org/10.3390/bdcc7010005

Chamid, A. A., Widowati, & Kusumaningrum, R. (2024). Labeling Consistency Test of Multi-Label Data for Aspect and Sentiment Classification Using the Cohen Kappa Method. Ingénierie Des Systèmes d Information, 29(1), 161–167. https://doi.org/10.18280/isi.290118

Chanda, S., & Pal, S. (2023). The Effect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social Media. SN Computer Science, 4(5), 494. https://doi.org/10.1007/s42979-023-01942-7

Efraim, D. A., & Ermatita. (2023). Analisis Sentimen Pada Sosial Media Instagram Menggunakan Algoritma Naive Bayes (Studi Kasus : Timnas Futsal Indonesia). In Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA) (pp. 498–509). Retrieved from https://conference.upnvj.ac.id/index.php/senamika/article/view/2574

Fasnuari, D., Andrian, H., Yuana, H., & Chulkamdi, M. T. (2022). Penerapan Algoritma K-Nearest Neighbor Untuk Klasifikasi Penyakit Diabetes Melitus. Antivirus : Jurnal Ilmiah Teknik Informatika, 16(2), 133–142. https://doi.org/10.35457/antivirus.v16i2.2445

Hakim, Z. R., & Sugiyono. (2024). Analisa Sentimen Terhadap Kereta Cepat Jakarta – Bandung Menggunakan Algoritma Naïve Bayes Dan K-Nearest Neighbor. Jurnal Sains Dan Teknologi, 5(3), 939–945. https://doi.org/10.55338/saintek.v5i3.1423

Jazuli, A., Widowati, Chamid, A. A., & Kusumaningrum, R. (2025). Transformer-based semantic indexing for aspect-based sentiment analysis using an enhanced index generation algorithm with BERT. International Journal of Advanced Technology and Engineering Exploration, 12(127). https://doi.org/10.19101/IJATEE.2024.111102114

Merdiansah, R., Siska, S., & Ali Ridha, A. (2024). Analisis Sentimen Pengguna X Indonesia Terkait Kendaraan Listrik Menggunakan IndoBERT. Jurnal Ilmu Komputer Dan Sistem Informasi (JIKOMSI), 7(1), 221–228. https://doi.org/10.55338/jikomsi.v7i1.2895

Mola, S. A. S., Lete, P. R., Triyanto, T., Ajilo, B. J. A. J., & Widiastuti, T. (2024). Analisis sentimen menggunakan metode Naive Bayes dan Support Vector Machine pada kasus pelantikan artis sebagai anggota DPR RI tahun 2024. HOAQ: Jurnal Teknologi Informasi, 15(1), 22–32. https://doi.org/https://doi.org/10.52972/hoaq.vol15no1.p22-32

Muhayat, T., Fauzi, A., & Indra, J. (2023). Analisis sentimen terhadap komentar video YouTube menggunakan Support Vector Machines. Progresif: Jurnal Ilmiah Komputer, 15(2).

Ningsih, R. A., & Fatah, Z. (2025). Analisis sentimen komentar YouTube terhadap tragedi demo 25 Agustus menggunakan pendekatan lexicon-based. JAMASTIKA: Jurnal Mahasiswa Teknik Informatika, 4(2).

Ratnaswari, S., Wibowo, N. C., & Kartika, D. S. Y. (2025). Analisis sentimen menggunakan metode lexicon-based dan support vector machine pada presiden dan wakil presiden Indonesia periode 2024–2029. Jurnal Informatika Dan Teknik Elektro Terapan (JITET), 13(1). https://doi.org/https://doi.org/10.23960/jitet.v13i1.5604

Siddiq, M. J., Jayasri, S., Suhendi, A., Hidayat, T., & Rizky, R. (2025). Analisis sentimen opini masyarakat terhadap Pilkada 2024 di media sosial Twitter menggunakan algoritma Naive Bayes. Jurnal Informatika Dan Teknik Elektro Terapan (JITET), 13(2). Retrieved from http://dx.doi.org/10.23960/jitet.v13i2.6280

Syafia, A. N., Hidayattullah, M. F., & Suteddy, W. (2023). Studi Komparasi Algoritma SVM Dan Random Forest Pada Analisis Sentimen Komentar Youtube BTS. Jurnal Informatika: Jurnal Pengembangan IT, 8(3), 207–212. https://doi.org/10.30591/jpit.v8i3.5064

Syofiani, F., Alam, S., & Sulistyo, M. I. S. (2023). Analisis sentimen penilaian masyarakat terhadap childfree berdasarkan komentar di YouTube menggunakan algoritma Naive Bayes. Jurnal Teknologi Informatika Dan Komputer MH. Thamrin, 9(2). https://doi.org/https://doi.org/10.37012/jtik.v9i2.1661

Umrona, R. D., Anwar, S. N., & Soelistijadi, R. (2025). Analisis sentimen komentar YouTube terkait kasus pagar laut menggunakan metode KNN (K-Nearest Neighbor). JINTEKS: Jurnal Informatika Teknologi Dan Sains, 7(3), 1537–1544. https://doi.org/https://doi.org/10.51401/jinteks.v7i3.6251

Undap, M., Rantung, V. P., & Rompas, P. T. D. (2021). Analisis Sentimen Situs Pembajak Artikel Penelitian Menggunakan Metode Lexicon-Based. Jointer - Journal of Informatics Engineering, 2(02), 39–46. https://doi.org/10.53682/jointer.v2i02.44

Utami, R. W., Jazuli, A., & Khotimah, T. (2021). Analisis Sentimen Terhadap Xiaomi Indonesia Menggunakan Metode Naïve Bayes. Indonesian Journal of Technology, Informatics and Science (IJTIS), 3(1), 21–30. https://doi.org/10.24176/ijtis.v3i1.7514

Utomo, W. P. (2022). Hoax and Paradox of Digital Public Sphere. Jurnal Komunikasi Indonesia, 11(1). https://doi.org/10.7454/jkmi.v11i1.1024

Uyun, Q., & Qoiriah, A. (2024). Analisis sentimen opini publik terhadap program Merdeka Belajar Kampus Merdeka dengan algoritma Naive Bayes–Support Vector Machine (NBSVM). JINACS: Journal of Informatics and Computer Science, 6(2).

Wibowo, I. S., Witanti, A., & Susilawati, I. (2024). Keyword Extraction Judul Berita Online Di Indonesia Menggunakan Metode TF-IDF. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 11(1). https://doi.org/https://doi.org/https://doi.org/10.35957/jatisi.v11i1.6718

Downloads

Published

2025-12-15

How to Cite

Samsudin, S., Abdul Chamid, A., & Jazuli, A. (2025). COMPARATIVE MACHINE LEARNING ALGORITHMS FOR YOUTUBE SENTIMENT ANALYSIS ON DPR DEMONSTRATION 2025 USING LEXICON. Jurnal Riset Informatika, 8(1), 74–85. https://doi.org/10.34288/jri.v8i1.470