Comparative Analysis of Using Word Embedding in Deep Learning for Text Classification
Abstract
A group of theory-driven computing techniques known as natural language processing (NLP) are used to interpret and represent human discourse automatically. From part-of-speech (POS) parsing and tagging to machine translation and dialogue systems, NLP enables computers to carry out various natural language-related activities at all levels. In this research, we compared word embedding techniques FastText and GloVe, which are used for text representation. This study aims to evaluate and compare the effectiveness of word embedding in text classification using LSTM (Long Short-Term Memory). The research stages start with dataset collection, pre-processing, word embedding, split data, and the last is deep learning techniques. According to the results of the experiments, when compared to the glove technique it seems that FastText is superior, the accuracy obtained reaches 90%. The number of epochs did not significantly improve the accuracy of the LSTM model with GloVe and FastText. It can be concluded that the FastText word embedding technique is superior to the GloVe technique.
Downloads
References
AlSurayyi, W. I., Alghamdi, N. S., & Abraham, A. (2019). Deep learning with word embedding modeling for a sentiment analysis of online reviews. International Journal of Computer Information Systems and Industrial Management Applications, 11, 227–241. Retrieved from http://www.mirlabs.org/ijcisim/regular_papers_2019/IJCISIM_22.pdf
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Transactions of the Association for Computational Linguistics. Transactions of the Association for Computational Linguistics, 5, 135–146. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/999
Botrè, C., Lucarini, C., Memoli, A., & D’Ascenzo, E. (1981). 397 - On the entropy production in oscillating chemical systems. Bioelectrochemistry and Bioenergetics, 8(2), 201–212. https://doi.org/10.1016/0302-4598(81)80041-4
Brennan, P. M., Loan, J. J. M., Watson, N., Bhatt, P. M., & Bodkin, P. A. (2017). Pre-operative obesity does not predict poorer symptom control and quality of life after lumbar disc surgery. British Journal of Neurosurgery, 31(6), 682–687. https://doi.org/10.1080/02688697.2017.1354122
Deho, O. B., Agangiba, W. A., Aryeh, F. L., & Ansah, J. A. (2018). Sentiment analysis with word embedding. 2018 IEEE 7th International Conference on Adaptive Science & Technology (ICAST), 1–4. https://doi.org/10.1109/ICASTECH.2018.8506717
Imaduddin, H., Widyawan, & Fauziati, S. (2019). Word embedding comparison for Indonesian language sentiment analysis. 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), 426–430. https://doi.org/10.1109/ICAIIT.2019.8834536
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21. https://doi.org/10.1108/eb026526
Kamiş, S., & Goularas, D. (2019). Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data. 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), 12–17. https://doi.org/10.1109/Deep-ML.2019.00011
Kilimci, Z. H., & Akyokus, S. (2019). The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification. 2019 4th International Conference on Computer Science and Engineering (UBMK), 548–553. IEEE. https://doi.org/10.1109/UBMK.2019.8907027
Marukatat, R. (2020). A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text. In Studies in Computational Intelligence (Vol. 847). Springer International Publishing. https://doi.org/10.1007/978-3-030-25217-5_7
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 1–12. Retrieved from https://arxiv.org/abs/1711.08609
Rahman, M. Z., Sari, Y. A., & Yudistira, N. (2021). Analisis Sentimen Tweet COVID-19 menggunakan Word Embedding dan Metode Long Short-Term Memory (LSTM). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 5(11), 5120–5127. Retrieved from http://j-ptiik.ub.ac.id
Rezaeinia, S. M., Ghodsi, A., & Rahmani, R. (2017). Improving the accuracy of pre-trained word embeddings for sentiment analysis. ArXiv, 1–15. Retrieved from https://arxiv.org/abs/1711.08609
Wang, C., Nulty, P., & Lillis, D. (2020). A Comparative Study on Word Embeddings in Deep Learning for Text Classification. Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, 37–46. https://doi.org/10.1145/3443279.3443304
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing [Review Article]. IEEE Computational Intelligence Magazine, 13(3), 55–75. https://doi.org/10.1109/MCI.2018.2840738
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent Neural Network Regularization. ArXiv, (2013), 1–8. Retrieved from http://arxiv.org/abs/1409.2329


Copyright (c) 2023 Mukhamad Rizal Ilham, Arif Dwi Laksito

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
An author who publishes in the Jurnal Riset Informatika agrees to the following terms:
- The author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-NonCommercial 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal
- The author is permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work (See The Effect of Open Access).
Read more about the Creative Commons Attribution-NonCommercial 4.0 Licence here: https://creativecommons.org/licenses/by-nc/4.0/.