Clickbait Detection in Indonesia Headline News Using Indobert and Roberta

Muhammad Edo Syahputra; Ade Putera Kemala; Dimas Ramdhan

doi:10.34288/jri.v5i4.237

Authors

Muhammad Edo Syahputra Bina Nusantara University
Ade Putera Kemala Bina Nusantara University
Dimas Ramdhan Bina Nusantara University

(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v5i4.237

Keywords:

Clickbait Detection, Data Augmentation, Deep Learning, Transformer

Abstract

This paper explores clickbait detection using Transformer models, specifically IndoBERT and RoBERTa. The objective is to leverage the models specifically for clickbait detection accuracy by employing balancing and augmentation techniques on the dataset. The research demonstrates the benefit of balancing techniques in improving model performance. Additionally, data augmentation techniques also improved the performance of RoBERTa. However, it resulted differently for IndoBERT with slightly decreased performance. These findings underline the importance of considering model selection and dataset characteristics when applying augmentation. Based on the result, IndoBERT, with a balanced distribution, outperformed the previous study and the other models used in this research. This study used three dataset distribution settings: unbalanced, balanced, and augmented with 8513, 6632, and 15503 total data counts, respectively. Furthermore, by incorporating balancing and augmentation techniques, the research surpasses previous studies, contributing to the advancement of clickbait detection accuracy, contributing to the advancement of clickbait detection accuracy with 95% accuracy in f1-score with unbalanced distribution. However, the augmentation method in this study only improved the RoBERTa model. Moreover, performance might be boosted by gathering more varied datasets. This work highlights the value of leveraging pre-trained Transformer models and specific dataset-handling techniques. The implications include the necessity of dataset balancing for accurate detection and the varying impact of augmentation on different models. These insights aid researchers and practitioners in making informed decisions for clickbait detection tasks, benefiting content moderation, online user experience, and information reliability. The study emphasizes the significance of utilizing state-of-the-art models and tailored approaches to improve clickbait detection performance.

Downloads

Download data is not yet available.

References

Abbas, M., Ali Memon, K., & Aleem Jamali, A. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), 62.

Agrawal, A. (n.d.). Clickbait Detection using Deep Learning. Retrieved September 21, 2022, from https://www.reddit.com/r/news

Aju, D., Kumar, K. A., & Lal, A. M. (2022). Exploring News-Feed Credibility using Emerging Machine Learning and Deep Learning Models. Journal of Engineering Science and Technology Review, 15(3), 31–37. https://doi.org/10.25103/JESTR.153.04

Bondielli, A., & Marcelloni, F. (2019). A survey on fake news and rumour detection techniques. Information Sciences, 497, 38–55. https://doi.org/10.1016/J.INS.2019.05.035

Chakraborty, A., Paranjape, B., Kakarla, S., & Ganguly, N. (n.d.). Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media.

Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619. https://doi.org/10.1177/001316447303300309/ASSET/001316447303300309.FP.PNG_V03

Hadiyat, Y. D. (2019). Clickbait on Indonesia Online Media. Journal Pekommas, 4(1), 1. https://doi.org/10.30818/jpkm.2019.2040101

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1746–1751. https://doi.org/10.3115/V1/D14-1181

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. 757–770. https://doi.org/10.18653/V1/2020.COLING-MAIN.66

Manjesh, S., Kanakagiri, T., Vaishak, P., Chettiar, V., & Shobha, G. (2018). Clickbait Pattern Detection and Classification of News Headlines Using Natural Language Processing. 2nd International Conference on Computational Systems and Information Technology for Sustainable Solutions, CSITSS 2017. https://doi.org/10.1109/CSITSS.2017.8447715

Oliva, C., Palacio-Marín, I., Lago-Fernández, L. F., & Arroyo, D. (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. 1–6. https://doi.org/10.1145/3538969.3543791

Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. https://doi.org/10.21437/Interspeech.2019-2680

Perez, L., & Wang, J. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. Undefined.

Potthast, M., Köpsel, S., Stein, B., & Hagen, M. (2016). Clickbait Detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9626, 810–817. https://doi.org/10.1007/978-3-319-30671-1_72

ShuKai, SlivaAmy, WangSuhang, TangJiliang, & LiuHuan. (2017). Fake News Detection on Social Media. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.1145/3137597.3137600

Sirusstara, J., Alexander, N., Alfarisy, A., Achmad, S., & Sutoyo, R. (2022a). Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa). 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings, September, 248–253. https://doi.org/10.1109/AiDAS56890.2022.9918678

Sirusstara, J., Alexander, N., Alfarisy, A., Achmad, S., & Sutoyo, R. (2022b). Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa). 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings, 248–253. https://doi.org/10.1109/AIDAS56890.2022.9918678

Stine, R. (2016). An Introduction to Bootstrap Methods. Http://Dx.Doi.Org/10.1177/0049124189018002003, 18(2–3), 243–291. https://doi.org/10.1177/0049124189018002003

Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (n.d.). Attention Is All You Need.

Wei, J., & Zou, K. (n.d.). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. 6382–6388. Retrieved September 23, 2022, from http://github.

Wilie, B., Vincentio, K., Indra Winata, G., Cahyawijaya, S., Li, X., Lim, Z. Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., Purwarianti, A., & Bandung, I. T. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding (pp. 843–857). https://aclanthology.org/2020.aacl-main.85

William, A., & Sari, Y. (2020). CLICK-ID: A novel dataset for Indonesian clickbait headlines. Data in Brief, 32, 106231. https://doi.org/10.1016/J.DIB.2020.106231

Zheng, J., Yu, K., & Wu, X. (2021). A deep model based on Lure and Similarity for Adaptive Clickbait Detection. Knowledge-Based Systems, 214, 106714. https://doi.org/10.1016/J.KNOSYS.2020.106714

Zhou, M., Xu, W., Zhang, W., & Jiang, Q. (2022). Leverage knowledge graph and GCN for fine-grained-level clickbait detection. World Wide Web, 25(3), 1243–1258. https://doi.org/10.1007/S11280-022-01032-3