Comparative Analysis of Using Word Embedding in Deep Learning for Text Classification

Authors

  • Mukhamad Rizal Ilham Universitas Amikom Yogyakarta
  • Arif Dwi Laksito Universitas Amikom Yogyakarta
(*) Corresponding Author

Keywords:

Deep Learning, LSTM, Sentiment Analysis, Text Classification

Abstract

A group of theory-driven computing techniques known as natural language processing (NLP) are used to interpret and represent human discourse automatically. From part-of-speech (POS) parsing and tagging to machine translation and dialogue systems, NLP enables computers to carry out various natural language-related activities at all levels. In this research, we compared word embedding techniques FastText and GloVe, which are used for text representation. This study aims to evaluate and compare the effectiveness of word embedding in text classification using LSTM (Long Short-Term Memory). The research stages start with dataset collection, pre-processing, word embedding, split data, and the last is deep learning techniques. According to the experiments' results, it seems that FastText is superior compared to the glove technique. The accuracy obtained reaches 90%. The number of epochs did not significantly improve the accuracy of the LSTM model with GloVe and FastText. It can be concluded that the FastText word embedding technique is superior to the GloVe technique.

 

Keywords: Word Embedding; ; ; 

Downloads

Download data is not yet available.

References

AlSurayyi, W. I., Alghamdi, N. S., & Abraham, A. (2019). Deep learning with word embedding modeling for a sentiment analysis of online reviews. International Journal of Computer Information Systems and Industrial Management Applications, 11, 227–241. Retrieved from http://www.mirlabs.org/ijcisim/regular_papers_2019/IJCISIM_22.pdf

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Transactions of the Association for Computational Linguistics. Transactions of the Association for Computational Linguistics, 5, 135–146. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/999

Botrè, C., Lucarini, C., Memoli, A., & D’Ascenzo, E. (1981). 397 - On the entropy production in oscillating chemical systems. Bioelectrochemistry and Bioenergetics, 8(2), 201–212. https://doi.org/10.1016/0302-4598(81)80041-4

Brennan, P. M., Loan, J. J. M., Watson, N., Bhatt, P. M., & Bodkin, P. A. (2017). Pre-operative obesity does not predict poorer symptom control and quality of life after lumbar disc surgery. British Journal of Neurosurgery, 31(6), 682–687. https://doi.org/10.1080/02688697.2017.1354122

Deho, O. B., Agangiba, W. A., Aryeh, F. L., & Ansah, J. A. (2018). Sentiment analysis with word embedding. 2018 IEEE 7th International Conference on Adaptive Science & Technology (ICAST), 1–4. https://doi.org/10.1109/ICASTECH.2018.8506717

Imaduddin, H., Widyawan, & Fauziati, S. (2019). Word embedding comparison for Indonesian language sentiment analysis. 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), 426–430. https://doi.org/10.1109/ICAIIT.2019.8834536

Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21. https://doi.org/10.1108/eb026526

Kamiş, S., & Goularas, D. (2019). Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data. 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), 12–17. https://doi.org/10.1109/Deep-ML.2019.00011

Kilimci, Z. H., & Akyokus, S. (2019). The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification. 2019 4th International Conference on Computer Science and Engineering (UBMK), 548–553. IEEE. https://doi.org/10.1109/UBMK.2019.8907027

Marukatat, R. (2020). A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text. In Studies in Computational Intelligence (Vol. 847). Springer International Publishing. https://doi.org/10.1007/978-3-030-25217-5_7

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 1–12. Retrieved from https://arxiv.org/abs/1711.08609

Rahman, M. Z., Sari, Y. A., & Yudistira, N. (2021). Analisis Sentimen Tweet COVID-19 menggunakan Word Embedding dan Metode Long Short-Term Memory (LSTM). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 5(11), 5120–5127. Retrieved from http://j-ptiik.ub.ac.id

Rezaeinia, S. M., Ghodsi, A., & Rahmani, R. (2017). Improving the accuracy of pre-trained word embeddings for sentiment analysis. ArXiv, 1–15. Retrieved from https://arxiv.org/abs/1711.08609

Wang, C., Nulty, P., & Lillis, D. (2020). A Comparative Study on Word Embeddings in Deep Learning for Text Classification. Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, 37–46. https://doi.org/10.1145/3443279.3443304

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing [Review Article]. IEEE Computational Intelligence Magazine, 13(3), 55–75. https://doi.org/10.1109/MCI.2018.2840738

Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent Neural Network Regularization. ArXiv, (2013), 1–8. Retrieved from http://arxiv.org/abs/1409.2329

Downloads

Published

2023-03-25

How to Cite

Ilham, M. R., & Laksito, A. D. (2023). Comparative Analysis of Using Word Embedding in Deep Learning for Text Classification. Jurnal Riset Informatika, 5(2), 195–202. Retrieved from http://ejournal.kresnamediapublisher.com/index.php/jri/article/view/208

Issue

Section

Articles