SENTIMENT ANALYSIS AND ENTITY DETECTION ON NEWS HEADLINES TO SUPPORT INVESTMENT DECISIONS
Abstract
Accurate investment decisions are often influenced by information available in the media. News headlines, as part of information media, can provide an initial picture of market sentiment and ongoing trends. This research examines the importance of making appropriate investment decisions with a focus on sentiment analysis and entity detection in news headlines as supporting tools. Through machine learning-based sentiment analysis and Named Entity Recognition (NER) techniques, this study identifies opinions and entities such as company names, stock indices, and industry sectors in news headlines. This research compares three machine learning algorithms, namely SVM, Naive Bayes, and Random Forest using cross-validation. The result shows that the best algorithm is SVM with weighted average F1-score of 76,68%. Furthermore, hyperparameter optimization is performed using Optuna for the SVM algorithm, which is an innovation in the context of sentiment analysis on news headlines in Indonesia. The result shows an increase in weighted average F1-score to 78,14%. For NER, a rule-based method is used by utilizing the Jaro-Winkler string similarity function. The combination of sentiment analysis and NER is then presented in the form of a dashboard using Google Looker Studio tools, with data from sentiment analysis and NER results being processed periodically and automatically using Google Workflows. This research makes a significant contribution by expanding the scope of analysis from just one or a few issuers to all entities published on news portals thanks to NER support, making the results relevant to support investment decisions that are responsive to dynamic market changes.
Downloads
References
K. Singh dan S. S. Narta, “Investor’s Considerations Towards Investment Decisions in Stock Market,” International Journal of Advanced Research, 2020, doi: 10.21474/IJAR01/11906.
M. P. Cristescu, D. A. Mara, R. A. Nerișanu, L. C. Culda, dan I. Maniu, “Analyzing the Impact of Financial News Sentiments on Stock Prices—A Wavelet Correlation,” Mathematics, vol. 11, no. 23, 2023, doi: 10.3390/math11234830.
K. R. Dahal dkk., “A comparative study on effect of news sentiment on stock price prediction with deep learning architecture,” PLOS ONE, vol. 18, no. 4, hlm. e0284695, Apr 2023, doi: 10.1371/journal.pone.0284695.
M. N. Ashtiani dan B. Raahemi, “News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review,” Expert Systems with Applications, vol. 217, hlm. 119509, Mei 2023, doi: 10.1016/j.eswa.2023.119509.
X. Li, P. Wu, dan W. Wang, “Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong,” Information Processing & Management, vol. 57, no. 5, hlm. 102212, Sep 2020, doi: 10.1016/j.ipm.2020.102212.
T. Loughran dan B. Mcdonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” The Journal of Finance, vol. 66, no. 1, hlm. 35–65, 2011, doi: 10.1111/j.1540-6261.2010.01625.x.
A. E. de Oliveira Carosia, G. P. Coelho, dan A. E. A. da Silva, “Investment strategies applied to the Brazilian stock market: A methodology based on Sentiment Analysis with deep learning,” Expert Systems with Applications, vol. 184, hlm. 115470, Des 2021, doi: 10.1016/j.eswa.2021.115470.
A. Jabbari, O. Sauvage, H. Zeine, dan H. Chergui, “A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News,” dalam Proceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, dan S. Piperidis, Ed., Marseille, France: European Language Resources Association, Mei 2020, hlm. 2293–2299. Diakses: 11 Mei 2024. [Daring]. Tersedia pada: https://aclanthology.org/2020.lrec-1.279
A. Sinha, S. Kedas, R. Kumar, dan P. Malo, “SEntFiN 1.0: Entity-aware sentiment analysis for financial news,” Journal of the Association for Information Science and Technology, vol. 73, no. 9, hlm. 1314–1335, 2022, doi: 10.1002/asi.24634.
E. T. Khaing, M. M. Thein, dan M. M. Lwin, “Stock Trend Extraction using Rule-based and Syntactic Feature-based Relationships between Named Entities,” dalam 2019 International Conference on Advanced Information Technologies (ICAIT), Nov 2019, hlm. 78–83. doi: 10.1109/AITC.2019.8920986.
R. Puspitasari, Y. Findawati, dan M. A. Rosid, “Sentiment Analysis of Post-Covid-19 Inflation Based On Twitter Using the K-Nearest Neighbor and Support Vector Machine Classification Methods,” Jurnal Teknik Informatika (Jutif), vol. 4, no. 4, Art. no. 4, Agu 2023, doi: 10.52436/1.jutif.2023.4.4.801.
S. A. H. Bahtiar, C. K. Dewa, dan A. Luthfi, “Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling,” Journal of Information Systems and Informatics, vol. 5, no. 3, Art. no. 3, Agu 2023, doi: 10.51519/journalisi.v5i3.539.
P. Nandwani dan R. Verma, “A review on sentiment analysis and emotion detection from text,” Soc. Netw. Anal. Min., vol. 11, no. 1, hlm. 81, Agu 2021, doi: 10.1007/s13278-021-00776-6.
L. Barreñada, P. Dhiman, D. Timmerman, A.-L. Boulesteix, dan B. V. Calster, “Understanding overfitting in random forest for probability estimation: a visualization and simulation study,” 30 September 2024, arXiv: arXiv:2402.18612. doi: 10.48550/arXiv.2402.18612.
N. Afrianto, D. H. Fudholi, dan S. Rani, “Prediksi Harga Saham Menggunakan BiLSTM dengan Faktor Sentimen Publik,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Feb 2022, Diakses: 11 Mei 2024. [Daring]. Tersedia pada: https://jurnal.iaii.or.id/index.php/RESTI/article/view/3676
T. Akiba, S. Sano, T. Yanase, T. Ohta, dan M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” dalam Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, dalam KDD ’19. New York, NY, USA: Association for Computing Machinery, Jul 2019, hlm. 2623–2631. doi: 10.1145/3292500.3330701.
T. T. Ngoc, C. M. T. Le Van Dai, dan C. M. Thuyen, “Support vector regression based on grid search method of hyperparameters for load forecasting,” Acta Polytechnica Hungarica, vol. 18, no. 2, hlm. 143–158, 2021.
A. C. Najib, A. Irsyad, G. A. Qandi, dan N. A. Rakhmawati, “Perbandingan Metode Lexicon-based dan SVM untuk Analisis Sentimen Berbasis Ontologi pada Kampanye Pilpres Indonesia Tahun 2019 di Twitter,” Fountain of Informatics Journal, vol. 4, no. 2, Art. no. 2, Nov 2019, doi: 10.21111/fij.v4i2.3573.
H. C. Husada dan A. S. Paramita, “Analisis Sentimen Pada Maskapai Penerbangan di Platform Twitter Menggunakan Algoritma Support Vector Machine (SVM),” Teknika, 2021, Diakses: 3 Juni 2024. [Daring]. Tersedia pada: https://ejournal.ikado.ac.id/index.php/teknika/article/view/311
R. K. Putri dan M. Athoillah, “Support Vector Machine untuk Identifikasi Berita Hoax Terkait Virus Corona (Covid-19),” Jurnal Informatika: Jurnal Pengembangan IT, vol. 6, no. 3, Art. no. 3, Okt 2021, doi: 10.30591/jpit.v6i3.2489.
Y. Qi dan Z. Shabrina, “Sentiment analysis using Twitter data: a comparative application of lexicon- and machine-learning-based approach,” Soc. Netw. Anal. Min., vol. 13, no. 1, hlm. 31, Feb 2023, doi: 10.1007/s13278-023-01030-x.
C. C. Aggarwal dan C. C. Aggarwal, Mining text data. Springer, 2015.
E. I. Setiawan, S. Johanes, A. T. Hermawan, dan Y. Yamasari, “Deteksi Validitas Berita pada Media Sosial Twitter dengan Algoritma Naive Bayes,” INSYST: Journal of Intelligent System and Computation, vol. 3, no. 2, Art. no. 2, Okt 2021, doi: 10.52985/insyst.v3i2.164.
S. S. dan P. K.v., “Sentiment analysis of malayalam tweets using machine learning techniques,” ICT Express, vol. 6, no. 4, hlm. 300–305, Des 2020, doi: 10.1016/j.icte.2020.04.003.
T. Winarti, H. Indriyawati, V. Vydia, dan F. W. Christanto, “Performance comparison between naive bayes and k-nearest neighbor algorithm for the classification of Indonesian language articles,” IAES International Journal of Artificial Intelligence, vol. 10, no. 2, hlm. 452, 2021.
L. Breiman, “Random forests,” Machine learning, vol. 45, hlm. 5–32, 2001.
M. Rapp, E. Mencía, dan J. Fürnkranz, Simplifying Random Forests: On the Trade-off between Interpretability and Accuracy. 2019. doi: 10.48550/arXiv.1911.04393.
X. Chen, T. (Yang H. Cho, Y. Dou, dan B. Lev, “Predicting Future Earnings Changes Using Machine Learning and Detailed Financial Data,” 12 Februari 2022, Social Science Research Network, Rochester, NY: 3741015. doi: 10.2139/ssrn.3741015.
N. M. Nhat, “Applied Random Forest Algorithm for News and Article Features on The Stock Price Movement: An Empirical Study of The Banking Sector in Vietnam,” Journal of Applied Data Sciences, vol. 5, no. 3, Art. no. 3, Sep 2024, doi: 10.47738/jads.v5i3.338.
B. Ghojogh dan M. Crowley, The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial. 2019. doi: 10.48550/arXiv.1905.12787.
B. Bischl dkk., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” WIREs Data Mining and Knowledge Discovery, vol. 13, no. 2, hlm. e1484, 2023, doi: 10.1002/widm.1484.
J. Bergstra, R. Bardenet, Y. Bengio, dan B. Kégl, “Algorithms for Hyper-Parameter Optimization,” Advances in Neural Information Processing Systems, vol. 24, 2011.
J. Wang, C. Lin, M. Li, dan C. Zaniolo, “Boosting approximate dictionary-based entity extraction with synonyms,” Information Sciences, vol. 530, hlm. 1–21, Agu 2020, doi: 10.1016/j.ins.2020.04.025.
W. Cohen, P. Ravikumar, dan S. Fienberg, “A comparison of string metrics for matching names and records,” dalam Kdd workshop on data cleaning and object consolidation, 2003, hlm. 73–78.
O. Rozinek dan J. Mares, “Fast and Precise Convolutional Jaro and Jaro-Winkler Similarity,” Mei 2024. doi: 10.23919/FRUCT61870.2024.10516360.
Y. Wang, J. Qin, dan W. Wang, “Efficient Approximate Entity Matching Using Jaro-Winkler Distance,” dalam Web Information Systems Engineering – WISE 2017, A. Bouguettaya, Y. Gao, A. Klimenko, L. Chen, X. Zhang, F. Dzerzhinskiy, W. Jia, S. V. Klimenko, dan Q. Li, Ed., Cham: Springer International Publishing, 2017, hlm. 231–239. doi: 10.1007/978-3-319-68783-4_16.
Copyright (c) 2024 Ajar Parama Adhi, Khairil Umuri, Gandung Triyono

This work is licensed under a Creative Commons Attribution 4.0 International License.