Analysis of Indonesian Language Dataset for Tax Court Cases: Multiclass Classification of Court Verdicts


  • Ade Putera Kemala Bina Nusantara University
  • Hafizh Ash Shiddiqi Bina Nusantara University
(*) Corresponding Author



BERT, Classification, Deep learning, NLP, Tax


Tax is an obligation that arises due to the existence of laws, creating a duty for citizens to contribute a certain portion of their income to the state. The Tax Court serves as a judicial authority for taxpayers seeking justice in tax disputes, handling various types of taxes on a daily basis. This paper presents an analysis of an Indonesian language dataset of tax court cases, aiming to perform multiclass classification to predict court verdicts. The dataset undergoes preprocessing steps, while data augmentation using oversampling and label weighting techniques address class imbalance. Two models, bi-LSTM and IndoBERT, are utilized for classification. The research produced a final result of model with 75.83% using IndoBERT model. The results demonstrate the efficacy of both models in predicting court verdicts. This research has implications for predicting court conclusions with limited case details, providing valuable insights for legal decision-making processes. The findings contribute to the field of legal data analysis, showcasing the potential of NLP techniques in understanding and predicting court outcomes, thus enhancing the efficiency of legal proceedings.


Download data is not yet available.

Author Biographies

Ade Putera Kemala, Bina Nusantara University

School of Computer Science, Data Science

Hafizh Ash Shiddiqi, Bina Nusantara University

School of Computer Science, Computer Science


Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631.

Christian, W. (2021). Indonesian Tax Court Verdict Summary.

Church, K. W. (2017). Emerging Trends: Word2Vec. Natural Language Engineering, 23(1), 155–162.

Church, K. W., Luoma, J., & Pyysalo, S. (2020). Exploring cross-sentence contexts for named entity recognition with BERT. ArXiv Preprint ArXiv:2006.01563, 23(1), 155–162.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint ArXiv:1810.04805.

Farouq, M. (2018). Hukum pajak di Indonesia. Prenada Media.

Ferdiana, R., Jatmiko, F., Purwanti, D. D., Ayu, A. S. T., & Dicka, W. F. (2019). Dataset Indonesia untuk Analisis Sentimen. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 8(4), 334–339.

Halim, A., Bawono, I. R., & Dara, A. (2014). Perpajakan: Konsep, Aplikasi, Contoh, dan Studi Kasus. Jakarta: Salemba Empat.

Madabushi, H. T., Kochkina, E., & Castelle, M. (2020). Cost-sensitive BERT for generalisable sentence classification with imbalanced data. ArXiv Preprint ArXiv:2003.11563.

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.

Pracasya, D. P. (2021). Penerapan Peraturan Perundang-Undangan Pajak Daerah Atas Perubahan Pasal Mengenai Perpajakan Dalam Undang-Undang Dasar Republik Indonesia Tahun 1945. " Dharmasisya” Jurnal Program Magister Hukum FHUI, 1(2), 13.

Rohendi, A. (2014). Fungsi budgeter dan fungsi regulasi dalam ketentuan perpajakan indonesia. Jurnal Ecodemica: Jurnal Ekonomi, Manajemen, Dan Bisnis, 2(1), 119–126.

Sandra. (2021). Mengenal Tugas dan Wewenang Pengadilan Pajak.

Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune bert for text classification? China National Conference on Chinese Computational Linguistics, 194–206.

Sutedi, A. (2022). Hukum pajak. Sinar Grafika.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. ArXiv Preprint ArXiv:1706.03762.

Wang, Z., Ng, P., Ma, X., Nallapati, R., & Xiang, B. (2019). Multi-passage bert: A globally normalized bert model for open-domain question answering. ArXiv Preprint ArXiv:1908.08167.

Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. ArXiv Preprint ArXiv:1901.11196.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., Soleman, S., Mahendra, R., Fung, P., & Bahar, S. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. ArXiv Preprint ArXiv:2009.05387.

Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent neural networks: LSTM cells and network architectures. Neural Computation, 31(7), 1235–1270.




How to Cite

Kemala, A. P., & Shiddiqi, H. A. (2023). Analysis of Indonesian Language Dataset for Tax Court Cases: Multiclass Classification of Court Verdicts. Jurnal Riset Informatika, 5(3), 419–424.




Most read articles by the same author(s)