STEMMINDO: A WEB-BASED INDONESIAN STEMMING ENGINE USING ENHANCED CONFIX STRIPPING

Authors

  • Novi Prisma Yunita Universitas Amikom Yogyakarta
  • Helmi Universitas Jenderal Soedirman
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v8i3.521

Keywords:

Stemming, Indonesian Language, Root Word, Enhanced Confix Stripping, Web-based Application

Abstract

Stemming is an essential preprocessing stage in Natural Language Processing (NLP), particularly for Indonesian, which has complex affixation patterns. Most Indonesian stemming implementations are provided as programming libraries, making them less accessible for beginners, educators, and non-programmer researchers. This study presents Stemmindo, a lightweight web-based Indonesian root word search application implementing the Enhanced Confix Stripping (ECS) algorithm using the Laravel framework. Unlike conventional stemming libraries, the system provides a real-time and modular interface that enables users to explore Indonesian morphological processing without writing program code. The novelty of this research lies in the implementation of ECS within an accessible web-based educational tool. Evaluation was conducted using affixation pattern testing, rule-based testing, and real-text evaluation. Testing on 20 affixation patterns achieved 90% accuracy, while evaluation on 100 words representing 33 derived prefix rules achieved 94% accuracy. After applying failure-handling strategies through exception lists and rule-level accommodations, the accuracy increased to 98%. Real-text evaluation was conducted using 1,742 words collected from Indonesian educational web content. After preprocessing and filtering, 564 unique words were evaluated, of which 366 stemming results were successfully matched with the corpus, while the remaining cases mainly consisted of named entities, noisy input, ambiguous forms, overstemming, and understemming. These findings indicate that the proposed system performs effectively for common Indonesian morphological patterns while remaining practical for educational and experimental NLP usage. Future work includes improving reduplication handling, expanding lexical resources, and enhancing accessibility features.

Downloads

Download data is not yet available.

References

Alfian, M., Barakbah, A. R., & Winarno, I. (2021). Indonesian Online News Extraction and Clustering Using Evolving Clustering. INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION, 5(3), 280–290. www.joiv.org/index.php/joiv

Almuzaini, H. A., & Azmi, A. M. (2020). Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization. IEEE Access, 8, 127913–127928. https://doi.org/10.1109/ACCESS.2020.3009217

Alyousf, M., & Alhalabi, M. F. (2025). A Survey of Document Stemming Algorithms in Information Retrieval Systems. ACM Transactions on Asian and Low-Resource Language Information Processing, 24(4), 1–28. https://doi.org/10.1145/3715120

Anistyasari, Y., & Hariadi, E. (2019). ALGORITMA BARU PEMBENTUKAN KATA DASAR PADA PROSES STEMMING BAHASA INDONESIA. Prosiding SNRT (Seminar Nasional Riset Terapan), 71–76.

Bahtiar, S. A. H., Dewa, C. K., & Luthfi, A. (2023). Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling. Journal of Information Systems and Informatics, 5(3), 915–927. https://doi.org/10.51519/journalisi.v5i3.539

Dang, N. C., Moreno-García, M. N., & De la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics (Switzerland), 9(3). https://doi.org/10.3390/electronics9030483

Darmalaksana, W., Slamet, C., Zulfikar, W. B., Fadillah, I. F., Maylawati, D. S. adillah, & Ali, H. (2020). Latent semantic analysis and cosine similarity for hadith search engine. Telkomnika (Telecommunication Computing Electronics and Control), 18(1), 217–227. https://doi.org/10.12928/TELKOMNIKA.V18I1.14874

Enni Lindrawati, Ema Utami, & Yaqin, A. (2023). ANoM STEMMER: Nazief & Andriani Modification for Madurese Stemming. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(6), 1341–1347. https://doi.org/10.29207/resti.v7i6.5086

Hambarde, K. A., & Proenca, H. (2023). Information Retrieval: Recent Advances and beyond. IEEE Access, 11, 76581–76604. https://doi.org/10.1109/ACCESS.2023.3295776

Jabbar, A., Iqbal, S., Tamimy, M. I., Hussain, S., & Akhunzada, A. (2020). Empirical evaluation and study of text stemming algorithms. Artificial Intelligence Review, 53(8), 5559–5588. https://doi.org/10.1007/s10462-020-09828-3

Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., & Wani, M. A. (2021). Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study. Applied Sciences, 11(9), 3986. https://doi.org/10.3390/app11093986

Kumar, M., Khan, L., & Chang, H. T. (2025). Evolving techniques in sentiment analysis: a comprehensive review. In PeerJ Computer Science (Vol. 11). PeerJ Inc. https://doi.org/10.7717/PEERJ-CS.2592

Le, D.-V.-T., Bigo, L., Herremans, D., & Keller, M. (2025). Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: A Survey. ACM Computing Surveys, 57(7), 1–40. https://doi.org/10.1145/3714457

Marlim, Y. (2024). A descriptive study of affixation in Chinese and Indonesian and their morphological types. Indonesian Journal of Applied Linguistics, 14(2), 273–286. https://doi.org/10.17509/ijal.v14i2.74904

Mustikasari, D., Widaningrum, I., Arifin, R., Henggal, W., & Putri, E. (2021). Comparison of Effectiveness of Stemming Algorithms in Indonesian Documents. Proceedings of the 2nd Borobudur International Symposium on Science and Technology (BIS-STE 2020). http://tiny.cc/rootwords.

Nugraha, D. S. (2024a). Analyzing Prefix /me(N)-/ in the Indonesian Affixation: A Corpus-Based Morphology. Theory and Practice in Language Studies, 14(6), 1697–1711. https://doi.org/10.17507/tpls.1406.10

Nugraha, D. S. (2024b). Investigating the Unproductive Morphological Forms in Indonesian Language. Asian Journal of Education and Social Studies, 50(4), 280–294. https://doi.org/10.9734/ajess/2024/v50i41330

Pramudita, Y. D., Putro, S. S., & Makhmud, N. (2018). Klasifikasi Berita Olahraga Menggunakan Metode Naïve Bayes dengan Enhanced Confix Stripping Stemmer. Jurnal Teknologi Informasi Dan Ilmu Komputer, 5(3), 269–276. https://doi.org/10.25126/jtiik.201853810

Prismana, I., Prehanto, D., Dermawan, D., Herlingga, A., & Wibawa, S. (2021). Nazief & Adriani Stemming Algorithm With Cosine Similarity Method For Integrated Telegram Chatbots With Service. IOP Conference Series: Materials Science and Engineering, 1125(1), 012039. https://doi.org/10.1088/1757-899x/1125/1/012039

Rianto, Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00413-1

Rintyarna, B. S., Kuswanto, H., Sarno, R., Rachmaningsih, E. K., Rachman, F. H., Suharso, W., & Cahyanto, T. A. (2022). Modelling Service Quality of Internet Service Providers during COVID-19: The Customer Perspective Based on Twitter Dataset. Informatics, 9(1). https://doi.org/10.3390/informatics9010011

Saddhono, K., Ermanto, Susanto, G., Istanti, W., & Sukmono, I. (2023). The Indonesian Prefix /Me-/: A Study in Productivity, Allomorphy, and Usage. International Journal of Society, Culture and Language, 11(3), 115–129. https://doi.org/10.22034/ijscl.2023.1972255.2828

Saifullah, S., Dreżewski, R., Dwiyanto, F. A., Aribowo, A. S., Fauziah, Y., & Cahyana, N. H. (2024). Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection. Applied Sciences (Switzerland), 14(3). https://doi.org/10.3390/app14031078

Sawicki, J., Ganzha, M., & Paprzycki, M. (2023). The State of the Art of Natural Language Processing—A Systematic Automated Review of NLP Literature Using NLP Techniques. Data Intelligence, 5(3), 707–749. https://doi.org/10.1162/dint_a_00213

Wang, H., Wu, H., He, Z., Huang, L., & Church, K. W. (2022). Progress in Machine Translation. In Engineering (Vol. 18, pp. 143–153). Elsevier Ltd. https://doi.org/10.1016/j.eng.2021.03.023

Zhu, H., Xia, J., Liu, R., & Deng, B. (2025). SPIRIT: Structural Entropy Guided Prefix Tuning for Hierarchical Text Classification. Entropy, 27(2). https://doi.org/10.3390/e27020128

Downloads

Published

2026-06-16

How to Cite

Yunita, N. P., & Jannah, H. R. (2026). STEMMINDO: A WEB-BASED INDONESIAN STEMMING ENGINE USING ENHANCED CONFIX STRIPPING. Jurnal Riset Informatika, 8(3), 369–377. https://doi.org/10.34288/jri.v8i3.521

Issue

Section

Articles