Active Learning Query by Committee Labeling Method to Increase Accuracy and Efficiency of Sentiment Analysis Classification

Dipa Anasta Iskandar; R. Mohamad Atok

doi:10.34288/jri.v7i4.386

Authors

Dipa Anasta Iskandar Sepuluh Nopember Institute of Technology
R. Mohamad Atok Sepuluh Nopember Institute of Technology

(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v7i4.386

Keywords:

Sentiment Analysis, Labeling Method, Query by Committee, Active Learning

Abstract

This study proposes the Query by Committee (QBC) labeling method to improve the accuracy of classification models—specifically XLM-RoBERTa—and to increase labeling efficiency compared to manual, supervised labeling, which generally requires more time and resources. The dataset consists of unannotated healthcare-industry application reviews scraped from Google Play. Six distinct labeling strategies were applied as input for fine-tuning XLM-RoBERTa models under identical hyperparameter settings. The six labeling approaches were evaluated namely Rating-based labeling, Lexicon-based labeling, QBC for Rating-Vader labeling, QBC for Rating-Pseudo labeling, QBC for Vader-Pseudo labeling, and QBC triplet for Rating-Pseudo-Vader labeling. Each labeled dataset was split using stratified random sampling, and class weights were set to “auto” during training to address label imbalance. All models were subsequently tested on the IndoNLU SmSA test dataset, with performance compared in terms of accuracy, precision, recall, and F1-score. Results indicate that the triplet QBC approach (combining Rating, VADER, and Pseudo labeling) outperformed all other methods, achieving an accuracy of 91.4%, a precision of 91.28%, a recall of 91.4%, and an F1-score of 91.21%. These findings demonstrate that the QBC labeling method can serve as an effective and efficient alternative to manual annotation for similar classification tasks

Downloads

Download data is not yet available.

References

Abiola, O., Abayomi-Alli, A., Tale, O. A., Misra, S., & Abayomi-Alli, O. (2023). Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser. Journal of Electrical Systems and Information Technology, 10(1). https://doi.org/10.1186/s43067-023-00070-9

Aliyah Salsabila, N., Ardhito Winatmoko, Y., Akbar Septiandri, A., & Jamal, A. (2018). Colloquial Indonesian Lexicon. Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 226–229. https://doi.org/10.1109/IALP.2018.8629151

Aljrees, T., Umer, M., Saidani, O., Almuqren, L., Ishaq, A., Alsubai, S., … Ashraf, I. (2024). Contradiction in text review and apps rating: prediction using textual features and transfer learning. PeerJ Computer Science, 10, e1722. https://doi.org/10.7717/PEERJ-CS.1722

Barik, K., & Misra, S. (2024). Analysis of customer reviews with an improved VADER lexicon classifier. Journal of Big Data, 11(1), 10. https://doi.org/10.1186/s40537-023-00861-x

Budianto, A. G., Wirjodirdjo, B., Maflahah, I., & Kurnianingtyas, D. (2022). Sentiment Analysis Model for KlikIndomaret Android App During Pandemic Using Vader and Transformers NLTK Library. 2022 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 0423–0427. IEEE. https://doi.org/10.1109/IEEM55944.2022.9989577

Esuli, A., & Sebastiani, F. (2009). Active Learning Strategies for Multi-Label Text Classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5478 LNCS, 102–113. https://doi.org/10.1007/978-3-642-00958-7_12

Fernando, K. R. M., & Tsokos, C. P. (2022). Dynamically Weighted Balanced Loss: Class Imbalanced Learning and Confidence Calibration of Deep Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 33(7), 2940–2951. https://doi.org/10.1109/TNNLS.2020.3047335

Hou, D., Zhang, Z., Zhao, M., Zhang, W., Zhao, Y., & Yu, J. (2024). Sentence-level Distant Supervision Relation Extraction based on Dynamic Soft Labels. Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024, 3194–3199. https://doi.org/10.1109/CSCWD61410.2024.10580472

Isnan, M., Elwirehardja, G. N., & Pardamean, B. (2023). Sentiment Analysis for TikTok Review Using VADER Sentiment and SVM Model. Procedia Computer Science, 227, 168–175. Elsevier B.V. https://doi.org/10.1016/j.procs.2023.10.514

JoMingyu. (n.d.). Google Play Scraper. Retrieved October 14, 2024, from https://github.com/JoMingyu/google-play-scraper

Kuligowska, K., & Kowalczuk, B. (2021). Pseudo-labeling with transformers for improving Question Answering systems. Procedia Computer Science, 192, 1162–1169. https://doi.org/10.1016/J.PROCS.2021.08.119

Lu, Y., Song, W., Arachie, C., & Huang, B. (2025). Weakly supervised label learning flows. Neural Networks, 182, 106892. https://doi.org/10.1016/J.NEUNET.2024.106892

Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., & Fernández-Leal, Á. (2022). Human-in-the-loop machine learning: a state of the art. Artificial Intelligence Review 2022 56:4, 56(4), 3005–3054. https://doi.org/10.1007/S10462-022-10246-W

Ruhyana -, N., Salsabila Dwi Irmanti -, K., Agung Riyadi -, A., & Mardiana -, T. (2025). SENTIMENT ANALYSIS OF USER REVIEWS BRI MOBILE APPLICATION WITH GRADIENT BOOST METHOD. Jurnal Riset Informatika, 7(2), 1–7. https://doi.org/10.34288/JRI.V7I2.342

Sadiq, S., Umer, M., Ullah, S., Mirjalili, S., Rupapara, V., & Nappi, M. (2021). Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Systems with Applications, 181, 115111. https://doi.org/10.1016/J.ESWA.2021.115111

Wang, X., Wan, L., & Zhang, J. (2019). An Active Learning Framework Based on Query-By-Committee for Sentiment Analysis. Proceedings of 2019 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2019, 327–331. https://doi.org/10.1109/ICAICA.2019.8873452

Wilie, B., Vincentio, K., Indra Winata, G., Cahyawijaya, S., Li, X., Lim, Z. Y., … Bandung, I. T. (2020). IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. Retrieved from https://arxiv.org/pdf/2009.05387

Wilson Wongso. (2023). Indonesian RoBERTa Base Sentiment Classifier. Hugging Face. Retrieved from https://huggingface.co/w11wo/indonesian-roberta-base-sentiment-classifier

Xu, M., & Guo, L. Z. (2021). Learning from group supervision: the impact of supervision deficiency on multi-label learning. Science China Information Sciences, 64(3), 1–13. https://doi.org/10.1007/S11432-020-3132-4/METRICS

Zhang, J., & Cao, M. (2023). Distant supervision for relation extraction with hierarchical attention-based networks. Expert Systems with Applications, 220, 119727. https://doi.org/10.1016/J.ESWA.2023.119727

Zhao, S., Hong, X., Yang, J., Zhao, Y., & Ding, G. (2023). Toward Label-Efficient Emotion and Sentiment Analysis. Proceedings of the IEEE, 111(10), 1159–1197. https://doi.org/10.1109/JPROC.2023.3309299