TOPIC CLASSIFICATION ON TWITTER USING CNN WITH WORD2VEC FEATURE EXPANSION

  • Rifaldy Bintang Ramadhan School of Computing, Universitas Telkom, Indonesia
  • Erwin Budi Setiawan School of Computing, Universitas Telkom, Indonesia
Keywords: CNN, feature expansion, topic classification, twitter, word2vec

Abstract

Twitter is a social networking site that enables users to communicate with their followers by sending them short messages known as "tweets." Each tweet has a character limit of 280 characters. The minimum limit of tweets resulted in writing short tweets and increased use of word variations. This makes tweets difficult to understand without the help of the topic, thus tweets should be classified. This study aims to classify topics of Twitter using word2vec feature expansion to decrease vocabulary ambiguities in topic classification. This type of research is system design research. Feature expansion is a machine learning technique used to extract new features (or variables) from the dataset's existing features. A model's complexity and expressive power are intended to be increased through feature expansion in order to improve performance and generalization. Data were processed using Convolutional Neural Network (CNN). The results indicate that there is an important contribution in increasing understanding of topic classification in Twitter data with Word2Vec, and the CNN application is able to assist some obstacles in analyzing short text with high word variations.

Downloads

Download data is not yet available.

References

F. A. R. Putra and Y. Sibaroni, “Detection of Radicalism Speech on Indonesian Tweet Using Convolutional Neural Network,” Build. Informatics, Technol. Sci., vol. 4, no. 2, 2022, doi: 10.47065/bits.v4i2.1907.

S. A. El Rahman, F. A. Alotaibi, and W. A. Alshehri, “Sentiment Analysis of Twitter Data,” 2019 Int. Conf. Comput. Inf. Sci. ICCIS 2019, 2019, doi: 10.1109/ICCISci.2019.8716464.

K. Chen, Z. Duan, and S. Yang, “Twitter as research data,” Polit. Life Sci., vol. 41, no. 1, pp. 114–130, 2022, doi: 10.1017/pls.2021.19.

H. F. Naufal and E. B. Setiawan, “Ekspansi Fitur Pada Analisis Sentimen Twitter Dengan Pendekatan Metode Word2Vec,” e-Proceeding Eng., vol. 8, no. 5, pp. 10339–10349, 2021, [Online]. Available: https://dev.twitter.com

R. Bharathi, R. Bhavani, and R. Priya, “Twitter Text Sentiment Analysis of Amazon Unlocked Mobile Reviews Using Supervised Learning Techniques,” Indian J. Comput. Sci. Eng., vol. 13, no. 4, pp. 1242–1253, 2022, doi: 10.21817/indjcse/2022/v13i4/221304100.

E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “the Accuracy Comparison Among Word2Vec, Glove, and Fasttext Towards Convolution Neural Network (Cnn) Text Classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 2, pp. 349–359, 2022.

S. Rani and J. Singh, “Sentiment Analysis of Tweets Using Support Vector Machine,” Int. J. Comput. Sci. Mob. Appl., vol. 5, pp. 83–91, 2017, [Online]. Available: www.xyz.com

X. Liu, B. Kar, C. Zhang, and D. M. Cochran, “Assessing relevance of tweets for risk communication,” Int. J. Digit. Earth, vol. 12, no. 7, pp. 781–801, 2019, doi: 10.1080/17538947.2018.1480670.

J. Brownlee, “Deep Learning for Natural Language Processing : Develop Deep Learning Models for Natural Language in Python,” Mach. Learn. Mastery, p. 414, 2017, [Online]. Available: http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes06-NMT_seq2seq_attention.pdf

B. Yu and C. T. Silva, “FlowSense: A natural language interface for visual data exploration within a dataflow system,” IEEE Trans. Vis. Comput. Graph., vol. 26, no. 1, pp. 1–11, 2020, doi: 10.1109/TVCG.2019.2934668.

K. Kreimeyer et al., “Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review,” J. Biomed. Inform., vol. 73, pp. 14–29, 2017, doi: 10.1016/j.jbi.2017.07.012.

W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 5, no. 2, p. 85, 2020, doi: 10.26555/jiteki.v5i2.15021.

S. Aslan, “MF-CNN-BILSTM: A Deep Learning-Based Sentiment Analysis Approach and Topic Modeling of Tweets Related to the Ukraine-Russia Conflict,” SSRN Electron. J., 2022, doi: 10.2139/ssrn.4218398.

M. Nazmi, A. Malisi, and E. B. Setiawan, “Ekspansi Fitur dengan Word2Vec pada Klasifikasi Topik dengan Metode Naive Bayes-Support Vector Machine di Twitter,” eProceedings …, vol. 9, no. 1, pp. 67–78, 2022, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/17390%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/17390/17100

A. Jaffe, Y. Kluger, O. Lindenbaum, J. Patsenker, E. Peterfreund, and S. Steinerberger, “The Spectral Underpinning of word2vec,” Front. Appl. Math. Stat., vol. 6, 2020, doi: 10.3389/fams.2020.593406.

G. Di Gennaro, A. Buonanno, and F. A. N. Palmieri, “Considerations about learning Word2Vec,” J. Supercomput., vol. 77, no. 11, pp. 12320–12335, 2021, doi: 10.1007/s11227-021-03743-2.

M. Jaca-Madariaga, E. Zarrabeitia-Bilbao, R. M. Rio-Belver, and M. F. Moens, “Sentiment Analysis Model Using Word2vec, Bi-LSTM and Attention Mechanism,” Lect. Notes Data Eng. Commun. Technol., vol. 160, pp. 239–244, 2023, doi: 10.1007/978-3-031-27915-7_43.

F. Ismayanti and E. B. Setiawan, “Deteksi Konten Hoax Berbahasa Indonesia Di Twitter Menggunakan Fitur Ekspansi Dengan Word2vec,” eProceedings …, vol. 8, no. 5, pp. 10288–10300, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697%0Ahttps://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/15697/15410

H. anggit Taba, “Analisis Sentimen Pemberlakuan Pembatasan Kegiatan Masyarakat Menggunakan Convolutional Neural Network,” pp. 1–88, 2022.

B. Desikan, “Natural Language Processing and Computational Linguistics: A practical guide,” Packt Publ. Ltd, 2018, [Online]. Available: https://books.google.com/books?hl=en&lr=&id=48RiDwAAQBAJ&oi=fnd&pg=PP1&dq=Python%27s+natural+language+processing+(NLP)+libraries,+such+as+NLTK+and+spaCy,+are+utilized+to+perform+sentiment+analysis,+keyword+extraction,+and+topic+modeling+from+code+comments

H. Juwiantho et al., “Sentiment Analysis Twitter Bahasa Indonesia Berbasis Word2vec Menggunakan Deep Convolutional Neural Network,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 1, pp. 181–188, 2018.

Published
2024-02-08
How to Cite
[1]
R. Bintang Ramadhan and E. Budi Setiawan, “TOPIC CLASSIFICATION ON TWITTER USING CNN WITH WORD2VEC FEATURE EXPANSION”, J. Tek. Inform. (JUTIF), vol. 5, no. 1, pp. 139-144, Feb. 2024.