Clickbait Detection in Indonesia Headline News Using IndoBERT and RoBERTa
This paper explores clickbait detection using Transformer models, specifically IndoBERT and RoBERTa. The objective is to leverage the models specifically for clickbait detection accuracy by employing balancing and augmentation techniques on the dataset. The research demonstrates the benefit of balancing techniques in improving model performance. Additionally, data augmentation techniques also improved the performance of RoBERTa. However, it resulted differently for IndoBERT with slightly decreased performance. These findings underline the importance of considering model selection and dataset characteristics when applying augmentation. Based on the result, IndoBERT, with a balanced distribution, outperformed the previous study and the other models used in this research. Furthermore, by incorporating balancing and augmentation techniques, the research surpasses previous studies, contributing to the advancement of clickbait detection accuracy. This work highlights the value of leveraging pre-trained Transformer models and specific dataset-handling techniques. The implications include the necessity of dataset balancing for accurate detection and the varying impact of augmentation on different models. These insights aid researchers and practitioners in making informed decisions for clickbait detection tasks, benefiting content moderation, online user experience, and information reliability. The study emphasizes the significance of utilizing state-of-the-art models and tailored approaches to improve clickbait detection performance.
Abbas, M., Ali Memon, K., & Aleem Jamali, A. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), 62.
Agrawal, A. (n.d.). Clickbait Detection using Deep Learning. Retrieved September 21, 2022, from https://www.reddit.com/r/news
Aju, D., Kumar, K. A., & Lal, A. M. (2022). Exploring News-Feed Credibility using Emerging Machine Learning and Deep Learning Models. Journal of Engineering Science and Technology Review, 15(3), 31–37. https://doi.org/10.25103/JESTR.153.04
Bondielli, A., & Marcelloni, F. (2019). A survey on fake news and rumour detection techniques. Information Sciences, 497, 38–55. https://doi.org/10.1016/J.INS.2019.05.035
Chakraborty, A., Paranjape, B., Kakarla, S., & Ganguly, N. (n.d.). Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619. https://doi.org/10.1177/001316447303300309/ASSET/001316447303300309.FP.PNG_V03
Hadiyat, Y. D. (2019). Clickbait on Indonesia Online Media. Journal Pekommas, 4(1), 1. https://doi.org/10.30818/jpkm.2019.2040101
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1746–1751. https://doi.org/10.3115/V1/D14-1181
Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. 757–770. https://doi.org/10.18653/V1/2020.COLING-MAIN.66
Manjesh, S., Kanakagiri, T., Vaishak, P., Chettiar, V., & Shobha, G. (2018). Clickbait Pattern Detection and Classification of News Headlines Using Natural Language Processing. 2nd International Conference on Computational Systems and Information Technology for Sustainable Solutions, CSITSS 2017. https://doi.org/10.1109/CSITSS.2017.8447715
Oliva, C., Palacio-Marín, I., Lago-Fernández, L. F., & Arroyo, D. (2022). Rumor and clickbait detection by combining information divergence measures and deep learning techniques. 1–6. https://doi.org/10.1145/3538969.3543791
Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. https://doi.org/10.21437/Interspeech.2019-2680
Perez, L., & Wang, J. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. Undefined.
Potthast, M., Köpsel, S., Stein, B., & Hagen, M. (2016). Clickbait Detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9626, 810–817. https://doi.org/10.1007/978-3-319-30671-1_72
ShuKai, SlivaAmy, WangSuhang, TangJiliang, & LiuHuan. (2017). Fake News Detection on Social Media. ACM SIGKDD Explorations Newsletter, 19(1), 22–36. https://doi.org/10.1145/3137597.3137600
Sirusstara, J., Alexander, N., Alfarisy, A., Achmad, S., & Sutoyo, R. (2022a). Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa). 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings, September, 248–253. https://doi.org/10.1109/AiDAS56890.2022.9918678
Sirusstara, J., Alexander, N., Alfarisy, A., Achmad, S., & Sutoyo, R. (2022b). Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa). 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings, 248–253. https://doi.org/10.1109/AIDAS56890.2022.9918678
Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (n.d.). Attention Is All You Need.
Wei, J., & Zou, K. (n.d.). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. 6382–6388. Retrieved September 23, 2022, from http://github.
Wilie, B., Vincentio, K., Indra Winata, G., Cahyawijaya, S., Li, X., Lim, Z. Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., Purwarianti, A., & Bandung, I. T. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding (pp. 843–857). https://aclanthology.org/2020.aacl-main.85
William, A., & Sari, Y. (2020). CLICK-ID: A novel dataset for Indonesian clickbait headlines. Data in Brief, 32, 106231. https://doi.org/10.1016/J.DIB.2020.106231
Zheng, J., Yu, K., & Wu, X. (2021). A deep model based on Lure and Similarity for Adaptive Clickbait Detection. Knowledge-Based Systems, 214, 106714. https://doi.org/10.1016/J.KNOSYS.2020.106714
Zhou, M., Xu, W., Zhang, W., & Jiang, Q. (2022). Leverage knowledge graph and GCN for fine-grained-level clickbait detection. World Wide Web, 25(3), 1243–1258. https://doi.org/10.1007/S11280-022-01032-3
Abstract viewed = 29 times
PDF downloaded = 24 times
Copyright (c) 2023 Muhammad Edo Syahputra, Ade Putera Kemala, Ade, Dimas Ramdhan Ramdhan
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
An author who publishes in the Jurnal Riset Informatika agrees to the following terms:
- The author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-NonCommercial 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal
- The author is permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work (See The Effect of Open Access).
Read more about the Creative Commons Attribution-NonCommercial 4.0 Licence here: https://creativecommons.org/licenses/by-nc/4.0/.