Enhancing BERTopic with Neural Network Clustering for Thematic Analysis of U.S. Presidential Speeches

Authors

  • Sajarwo Anggai Graduate Program of Informatics Engineering, Universitas Pamulang, Indonesia
  • Rafi Mahmud Zain Graduate Program of Informatics Engineering, Universitas Pamulang, Indonesia
  • Tukiyat Graduate Program of Informatics Engineering, Universitas Pamulang, Indonesia
  • Arya Adhyaksa Waskita Research Center for Data and Information Sciences, National Research and Innovation Agency, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.4.5090

Keywords:

Autoencoder, BERTopic, Deep Clustering, Political Discourse, Topic Modeling

Abstract

Understanding the underlying themes in presidential speeches is critical for analyzing political discourse and determining public policy direction.  However, topic modeling in this context presents difficulties, particularly when clustering semantically rich topics from high-dimensional embeddings.  This study seeks to improve topic modeling performance by incorporating a Neural Network Clustering (NNC) approach into the BERTopic pipeline.  We analyze 2,747 speeches delivered by U.S President Joe Biden (2021-2025) and compare three clustering techniques: HDBSCAN, KMeans, and the proposed Autoencoder-based NNC.  The evaluation metrics (UMass, NPMI, Topic Diversity) show that NNC produces the most coherent and diverse topic clusters (UMass = -0.4548, NPMI = 0.0234, Diversity = 0.3950, ).  These findings show that NNC can overcome the limitations of density and centroid-based clustering in high-dimensional semantic spaces. The study contributes to the field of Natural Language Processing by demonstrating how neural-based clustering can improve topic modeling, particularly for complex, real-world political corpora.

Downloads

Download data is not yet available.

References

R. M. Zain, S. Anggai, Tukiyat, A. Musyafa, and A. A. Waskita, “Revealing a Country ’ s Government Discourse Through BERT-based Topic Modeling in the US Presidential Speeches,” 2024 Int. Conf. Comput. Control. Informatics its Appl., vol. 11, pp. 191–196, 2024, doi: 10.1109/IC3INA64086.2024.10732578.

A. A. Hidayat, R. Nirwantono, A. Budiarto, and B. Pardamean, “BERT-based Topic Modeling Approach for Malaria Research Publication,” 2022 Int. Conf. Informatics, Multimedia, Cyber Inf. Syst., pp. 326–331, 2022, doi: 10.1109/ICIMCIS56303.2022.10017743.

S. Umamaheswaran, V. Dar, E. Sharma, and J. S. Kurian, “Mapping Climate Themes from 2008-2021 - An Analysis of Business News Using Topic Models,” IEEE Access, vol. 11, no. March, pp. 26554–26565, 2023, doi: 10.1109/ACCESS.2023.3256530.

D. J. Cahyadi, H. Murfi, Y. Satria, S. Abdullah, and Y. Widyaningsih, “BERT-Based Deep Embedded Clustering for Topic Modeling,” Int. Conf. Comput. Control. Informatics its Appl. IC3INA, no. 2024, pp. 331–336, 2024, doi: 10.1109/IC3INA64086.2024.10732729.

Y. An, H. Oh, and J. Lee, “Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering Network,” Appl. Sci., vol. 13, no. 16, Aug. 2023, doi: 10.3390/app13169443.

A. S. Kazmali and A. Sayar, “Web Scraping : Legal and Context Ethical Considerations in General Local - A Review,” Procedia Comput. Sci., vol. 259, pp. 1563–1572, 2025, doi: 10.1016/j.procs.2025.04.111.

L. R. Nisa, A. Luthfiarta, A. Nugraha, and M. Hasan, “A TOPIC-BASED APPROACH FOR RECOMMENDING UNDERGRADUATE THESIS SUPERVISOR USING LDA WITH COSINE SIMILARITY,” J. Tek. Inform., vol. 6, no. 1, pp. 311–323, 2025.

A. Parlina and I. Maryati, “Leveraging BERTopic for the Analysis of Scientific Papers on Seaweed,” 2023 Int. Conf. Comput. Control. Informatics its Appl., no. 2022, pp. 279–283, 2023, doi: 10.1109/IC3INA60834.2023.10285737.

D. Yohannes, Y. B. Sinshaw, S. H. Asefa, and Y. Assabie, “Amharic document clustering using semantic information from neural word embedding and encyclopedic knowledge,” Sci. African, vol. 28, p. e02657, 2025, doi: 10.1016/j.sciaf.2025.e02657.

V. Sharifian-attar, J. Li, H. Moss, and J. Johnson, “Analysing Longitudinal Social Science Questionnaires : Topic Modeling with BERT-based Embeddings,” 2022 IEEE Int. Conf. Big Data (Big Data), pp. 5558–5567, 2022, doi: 10.1109/BigData55660.2022.10020678.

J. Jeoung, J. Hong, J. Choi, and T. Hong, “Analyzing news and research articles about energy storage systems in South Korea based on network analysis and topic modeling,” Energy Build., vol. 335, no. January, p. 115547, 2025, doi: 10.1016/j.enbuild.2025.115547.

W. Kang, Y. Kim, H. Kim, and J. Lee, “An Analysis of Research Trends on Language Model using BERTopic,” 2023 Congr. Comput. Sci. Comput. Eng. & Appl. Comput., no. 2022, pp. 168–172, 2023, doi: 10.1109/CSCE60160.2023.00032.

H. Suryotrisongko, “Topic Modeling for Cyber Threat Intelligence ( CTI ),” 7th Int. Conf. Informatics Comput., pp. 1–7, 2022, doi: 10.1109/ICIC56845.2022.10006988.

M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv:2203.05794v1, 2022.

E. M. Kurniawan, “Analysing Hoax Dataset in Indonesian Language with Topic Modeling,” Int. Conf. ICT Smart Soc., vol. 10th, pp. 1–6, 2023, doi: 10.1109/ICISS59129.2023.10291599.

H. Son and Y. E. Park, “Agenda-setting effects for covid-19 vaccination: Insights from 10 million textual data from social media and news articles using BERTopic,” Int. J. Inf. Manage., vol. 83, no. April, p. 102907, 2025, doi: 10.1016/j.ijinfomgt.2025.102907.

K. T. Jacob Devlin, Ming-Wei Chang, Kenton Lee, “BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proc. ofNAACL-HLT 2019, pages 4171–4186 Minneapolis, Minnesota, June 2 - June 7, 2019. c?2019 Assoc. Comput. Linguist., pp. 4171–4186, 2019.

Z. Kastrati, A. L. I. S. Imran, S. M. Daudpota, M. A. Memon, and M. Kastrati, “Soaring Energy Prices : Understanding Public Engagement on Twitter Using Sentiment Analysis and Topic Modeling With Transformers,” IEEE Access, vol. 11, no. February, pp. 26541–26553, 2023, doi: 10.1109/ACCESS.2023.3257283.

G. Chen, X. Li, Y. Yang, and W. Wang, “Neural Clustering based Visual Representation Learning,” pp. 5714–5725, 2024, doi: 10.1109/CVPR52733.2024.00546.

N. Alami, M. Meknassi, N. En-nahnahi, Y. El Adlouni, and O. Ammor, “Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling,” Expert Syst. Appl., vol. 172, no. September 2020, p. 114652, 2021, doi: 10.1016/j.eswa.2021.114652.

G. Tian, P. Wang, R. Wang, and Y. Du, “Smart Contract Classification based on Neural Clustering and Semantic Feature Enhancement,” Blockchain Res. Appl., p. 100303, 2025, doi: 10.1016/j.bcra.2025.100303.

M. W. Akram, M. Salman, M. F. Bashir, S. M. S. Salman, T. R. Gadekallu, and A. R. Javed, “A Novel Deep Auto-Encoder Based Linguistics Clustering Model for Social Text,” ACM Trans. Asian Low-Resource Lang. Inf. Process., 2022, doi: 10.1145/3527838.

J. P. Lim and H. W. Lauw, “Aligning Human and Computational Coherence Evaluations,” Comput. Linguist., no. March, pp. 1–60, 2024, doi: 10.1162/coli_a_00518.

A. Bachoumis, C. Mylonas, K. Plakas, M. Birbas, and A. Birbas, “Data-Driven Analytics for Reliability in the Buildings-to-Grid Integrated System Framework : A Systematic Text-Mining-Assisted Literature Review and Trend Analysis,” IEEE Access, vol. 11, no. October, pp. 130763–130787, 2023, doi: 10.1109/ACCESS.2023.3335191.

G. Bouma, “Normalized ( Pointwise ) Mutual Information in Collocation Extraction,” Proc. Ger. Soc. Comput. Linguist. (GSCL 2009), pp. 31–40, 2009.

M. Białas, M. M. Mirończuk, and J. Mańdziuk, “Leveraging spiking neural networks for topic modeling,” Neural Networks, vol. 178, no. May, p. 106494, 2024, doi: 10.1016/j.neunet.2024.106494.

J. Song, Y. Yuan, K. Chang, B. Xu, J. Xuan, and W. Pang, “Exploring public attention in the circular economy through Topic Modeling with twin hyperparameter optimisation,” Energy AI, vol. 18, no. May, p. 100433, 2024, doi: 10.1016/j.egyai.2024.100433.

Additional Files

Published

2025-08-18

How to Cite

[1]
S. . Anggai, R. M. Zain, T. Tukiyat, and A. A. . Waskita, “Enhancing BERTopic with Neural Network Clustering for Thematic Analysis of U.S. Presidential Speeches”, J. Tek. Inform. (JUTIF), vol. 6, no. 4, pp. 1957–1970, Aug. 2025.