ANALYZING STACK OVERFLOW DISCUSSIONS ON C, JAVA, AND PYTHON: A MIXED-METHOD STUDY ON QUESTION TYPES AND TOPICS
Abstract
The modern software development characteristic is significantly shaped by the evolution of programming languages. The increasing complexity of these languages demands effective tools and resources for learning and troubleshooting. As a result, forums such as Stack Overflow (SO) have become crucial for addressing technical issues that arise during program execution, especially for novice programmers. Although discussions on SO are common, there hasn't been a clear description of the question types and topics for the three main programming languages, i.e., C, Java, and Python. This gap is problematic as it limits the ability of educators, platform designers, and developers to effectively address the specific needs of users. Without such insights, novice programmers may struggle to find relevant guidance, potentially hindering their learning and slowing the adoption of best practices. To fill this gap, we conducted a qualitative and quantitative study on these three language-related discussions shared on SO. By utilizing a dataset of 4,499,718 questions extracted from SOTorrent, we applied a manual labeling method to classify questions into categories such as “How,” “What,” and “Why.” Furthermore, we implemented Latent Dirichlet Allocation (LDA) for topic modeling to understand the prevalent discussion topics. The results show that “How” questions dominate across all languages, particularly in Python (60.94%), reflecting a high demand for practical implementation guidance. Analysis of discussion topics indicates that C is centered on system programming and low-level operations, while Java discusses more on application development and object-oriented programming. In contrast, Python focuses more on data handling and structures. These insights suggest that while practical support is necessary for learners, a deeper understanding of programming concepts and the need for customized instructional resources to support developers are important. The findings contribute to the community and relevant fields by offering actionable insights to improve the usability of SO as a learning and problem-solving platform.
Downloads
References
C. Raibulet, F. A. Fontana, and I. Pigazzini, “Teaching software engineering tools to undergraduate students,” in Proceedings of the 11th International Conference on Education Technology and Computers, pp. 262-267, 2019.
M. Marron, “Toward programming languages for reasoning: Humans, symbolic systems, and AI agents,” in Proceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pp. 136-152, 2023.
J. Noble and R. Biddle, “programmingLanguage as Language,” Proceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 2023.
I. Karvelas, “Investigating Novice Programmers’ Interaction with Programming Environments,” in Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education, pp. 336-337, 2019.
P. Denny, B. A. Becker, N. Bosch, J. Prather, B. Reeves, and J. Whalley, “Novice reflections during the transition to a new programming language,” in Proceedings of the 53rd ACM Technical Symposium on Computer Science Education-Volume 1, pp. 948-954, 2022.
T. Kato, Y. Kambayashi, and K. Oda, “An implementation of educational programming environment using tangible materials,” Human Systems Engineering and Design (IHSED 2023): Future Trends and Applications, vol. 112, no. 112, 2023.
D. M. Arya, J. L. Guo, and M. P. Robillard, “Properties and Styles of Software Technology Tutorials,” IEEE Transactions on Software Engineering, vol. 50, p. 159-172, 2023.
P. Chakraborty, R. Shahriyar, and A. Iqbal, “Empirical analysis of the growth and challenges of new programming languages,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1. IEEE, pp. 191-196, 2019.
P. Chakraborty, R. Shahriyar, A. Iqbal, and G. Uddin, “How do developers discuss and support new programming languages in technical Q&A site? an empirical study of go, swift, and rust in stack overflow,” Information and Software Technology, vol. 137, p. 106603, 2021.
H. Yang, Y. Nong, S. Wang, and H. Cai, “Multi-Language Software Development: Issues, Challenges, and Solutions,” IEEE Transactions on Software Engineering, vol. 50, pp. 512-533, 2024.
N. Brown, P. Weill-Tessier, M. Sekula, A. Costache, and M. Kölling, “Novice Use of the Java Programming Language,” ACM Transactions on Computing Education, vol. 23, pp. 1-24, 2022.
J. Liu, S. Baltes, C. Treude, D. Lo, Y. Zhang, and X. Xia, “Characterizing search activities on stack overflow,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 919-931, 2021.
I. G. Ndukwe, S. A. Licorish, and S. G. MacDonell, “Perceptions on the utility of community question and answer websites like stack overflow to software developers,” IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 2413-2425, 2022.
M. Nivala, A. Seredko, T. Osborne, and T. Hillman, “Stack Overflow – Informal learning and the global expansion of professional development and opportunities in programming?” in 2020 IEEE Global Engineering Education Conference (EDUCON), 402-408, 2020.
A. Barua, S. W. Thomas, and A. E. Hassan, “What are developers talking about? An analysis of topics and trends in stack overflow,” Empirical software engineering, vol. 19, pp. 619-654, 2014.
S. Islam, Y. S. Nugroho, and M. J. Hossain, “What network simulator questions do users ask? a large-scale study of stack overflow posts,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 21, no. 3, pp. 1622-1633, 2021.
Y. S. Nugroho, S. Islam, D. Gunawan, Y. I. Kurniawan, and M. J. Hossain, “Dataset of network simulator related-question posts in stack overflow,” Data in Brief, vol. 41, p. 107942, 2022.
G. T. Kurniaji, Y. S. Nugroho, and S. Islam, “A preliminary empirical study of react library related questions shared on stack overflow,” Computer Science and Information Technologies, vol. 4, no. 1, pp. 14-23, 2023.
A. A. Bangash, H. Sahar, S. Chowdhury, A. W. Wong, A. Hindle, and K. Ali, “What do developers know about machine learning: a study of ml discussions on Stack Overflow,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, pp. 260-264, 2019.
H. Zhang, S. Wang, T. H. Chen, and A. E. Hassan, “Reading answers on stack overflow: Not enough!” IEEE Transactions on Software Engineering, 47(11), 2520-2533, 2019.
G. G. Giwangkoro and Y. S. Nugroho, “Unveiling Research Trends in Stack Overflow: A Comprehensive Analysis of General Discussion Theme,” In Proceeding of the 2024 International Conference on Smart Computing, IoT and Machine Learning (SIML), 130-136, 2024.
R. Abdalkareem, E. Shihab, and J. Rilling, “What do developers use the crowd for? a study using stack overflow,” IEEE Software, vol. 34, no. 2, pp. 53–60, 2017.
H. Oka, A. Ohnishi, T. Nishida, T. Terada, and M. Tsukamoto, “A Choice-Based Programming Learning Method to Develop Problem-Solving Skills,” IEEE Access, vol. 12, pp. 119550-119562, 2024.
W. Yang, C. Zhang, and M. Pan, “Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023.
D. Lu, J. Wu, Y. Sheng, P. Liu, and M. Yang, “Analysis of the popularity of programming languages in open source software communities,” in 2020 International Conference on Big Data and Social Sciences (ICBDSS). IEEE, pp. 111-114, 2020.
S. Baltes, C. Treude, and S. Diehl, “Sotorrent: Studying the origin, evolution, and usage of stack overflow code snippets,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, pp. 191-194, 2019.
Y. S. Nugroho, D. Gunawan, D. A. Puspa Putri, S. Islam, and A. Alhefdhi, “A study of vulnerability identifiers in code comments: Source, purpose, and severity,” Journal of Communications Software and Systems, vol. 18, no. 2, pp. 165-174, 2022.
C. C. Silva, M. Galster, and F. Gilson, “Topic modeling in software engineering research,” Empirical Software Engineering, vol. 26, no. 6, p. 120, 2021.
A. Anandkumar, D. P. Foster, D. J. Hsu, S. M. Kakade, and Y.-K. Liu, “A spectral algorithm for Latent Dirichlet Allocation,” Advances in neural information processing systems, vol. 25, 2012.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, no. Jan, pp. 993–1022, 2003.
F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word cloud explorer: Text analytics based on word clouds,” in 2014 47th Hawaii International Conference on System Sciences. IEEE, pp. 1833–1842, 2014.
W. Cui, Y. Wu, S. Liu, F. Wei, M. X. Zhou, and H. Qu, “Context preserving dynamic word cloud visualization,” in 2010 IEEE Pacific Visualization Symposium (PacificVis). IEEE, pp. 121–128, 2010.
Copyright (c) 2025 Yusuf Sulistyo Nugroho, Aldin Nasrun Minalloh, Keke Rachma Devi, Syful Islam

This work is licensed under a Creative Commons Attribution 4.0 International License.