Enhancing Clustering Performance through Benchmarking of Dimensionality Reduction Techniques on Educational Data

Authors

  • Eko Priyanto Magister of Computer Science, Universitas Amikom Purwokerto, Indonesia
  • Berlilana Magister of Computer Science, Universitas Amikom Purwokerto, Indonesia
  • Imam Tahyudin Magister of Computer Science, Universitas Amikom Purwokerto, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.2.4297

Keywords:

Clustering, Dimensionality Reduction, K-Means, Non-Negative Matrix Factorization, Principal Component Analysis

Abstract

This study evaluates the effectiveness of dimensionality reduction techniques in enhancing clustering performance using a tracer study dataset of 500 alumni from UMNU Kebumen, containing 58 variables. The objective was to identify the optimal combination of dimensionality reduction and clustering methods for uncovering patterns in alumni profiles, job search strategies, and employment outcomes. Principal Component Analysis (PCA), Non- Negative Matrix Factorization (NMF), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) were applied, followed by clustering using K-Means, DBSCAN, and Hierarchical Clustering. The findings revealed that NMF achieved the highest clustering quality, particularly with K- Means and Hierarchical Clustering, outperforming PCA. NMF also demonstrated superior compactness with a Calinski-Harabasz Index of 287.96, compared to 125.88 for PCA. While t-SNE and UMAP delivered competitive results, their computational times of 245.8 and 76.5 seconds, respectively, made them less practical for large datasets. The novelty of this study lies in its comprehensive evaluation of dimensionality reduction techniques and the integration of diverse clustering algorithms to assess their interplay. The results provide actionable insights, recommending NMF for accuracy-critical tasks and PCA for time-sensitive applications. Given the increasing volume of high-dimensional educational data, this study highlights the critical need for efficient clustering strategies to extract meaningful insights, ultimately supporting data-driven decision-making in education and workforce planning. Addressing these challenges is essential to optimizing institutional strategies, improving student employability, and enhancing workforce alignment with industry demands.

Downloads

Download data is not yet available.

References

P. M. Hasugian, H. Mawengkang, P. Sihombing, dan S. Efendi, “Review of High-Dimensional and Complex Data Visualization,” in Proc. 2023 International Conference of Computer Science and Information Technology (ICOSNIKOM), 2023, pp. 1–7, doi: 10.1109/ICoSNIKOM60230.2023.10364377.

P. Ray, S. Reddy, and T. Banerjee, "Various dimension reduction techniques for high dimensional data analysis: A review," Artificial Intelligence Review, vol. 1, pp. 1-43, 2021, doi: 10.1007/s10462-020-09928-0.

P. Ray, S. Reddy, dan T. Banerjee, “Various dimension reduction techniques for high dimensional data analysis: a review,” Artificial Intelligence Review, vol. 1, pp. 1–43, 2021, doi: 10.1007/s10462-020-09928-0.

B. Rafieian, P. Hermosilla, and P.-P. Vázquez, "Improving Dimensionality Reduction Projections for Data Visualization," Applied Sciences, vol. 13, no. 79967, 2023, doi: 10.3390/app13179967.

J. Nelson, "Dimensionality Reduction in Euclidean Space," Notices of the American Mathematical Society, vol. 67, pp. 1-10, 2020, doi: 10.1090/noti2166.

Y. Xie, S. M. Beram, B. Kaur, R. Neware, M. Rakhra, dan D. Koundal, “Research on Visualization of Large-scale User Association Feature Data Based on Nonlinear Dimension Reduction Method,” J. Mobile Multimedia, vol. 19, pp. 587–602, 2022, doi: 10.13052/jmm1550- 4646.19211

B. Rafieian, P. Hermosilla, dan P.-P. Vázquez, “Improving Dimensionality Reduction Projections for Data Visualization,” Applied Sciences, vol. 13, 2023, doi: 10.3390/app13179967.

Z. Wang, P. Zhang, W. Sun, dan D.-X. Li, “Application of Dimension Reduction Methods to High-Dimensional Single-Cell 3D Genomic Contact Data,” IECE Transactions on Internet of Things, 2024, doi: 10.62762/tiot.2024.186430.

P. Ray, S. Reddy, dan T. Banerjee, “Various dimension reduction techniques for high dimensional data analysis: a review,” Artificial Intelligence Review, vol. 54, pp. 2713–2745, 2021, doi: 10.1007/s10462-020-09928-0.

S. S. Jambhulkar dan S. S. Gornale, “Feature dimensionality reduction: a review,” Complex & Intelligent Systems, vol. 8, pp. 3619–3639, 2021, doi: 10.1007/s40747-021-00637-x.

S. S. Jambhulkar dan S. S. Gornale, “An efficient dimensionality reduction based on adaptive- GSM and granular computing for high-dimensional data analysis,” Evolutionary Intelligence, vol. 17, 2023, doi: 10.1007/s41870-023-01552-9.

S. Shah and S. Joshi, "Study of Various Dimensionality Reduction and Classification Algorithms on High Dimensional Dataset," in Proc. 2021 Third Int. Conf. Inventive Research in Computing Applications (ICIRCA), pp. 1005-1010, 2021, doi: 10.1109/ICIRCA51532.2021.9544602.

V. T. N. Chau and P. Nguyen, "A kernel-induced weighted object-cluster association-based ensemble method for educational data clustering," Journal of Information and Telecommunication, vol. 4, no. 2, pp. 119–139, 2020, doi: 10.1080/24751839.2019.1660846.

R. Liu, "Data analysis of educational evaluation using K-means clustering method," Computational Intelligence and Neuroscience, vol. 2022, 2022, doi: 10.1155/2022/3762431.

D. Hooshyar, Y. Yang, M. Pedaste, and Y.-M. Huang, "Clustering algorithms in an educational context: An automatic comparative approach," IEEE Access, vol. 8, pp. 146994–147014, 2020, doi: 10.1109/ACCESS.2020.3014948.

C. Romero dan S. Ventura, “Educational Data Mining: A Foundational Overview,” Informatics,

vol. 4, no. 4, p. 108, 2023, doi: 10.3390/informatics4040108.

S. K. Dwivedi, S. K. Rath, dan A. K. Tripathy, “Uncovering the Educational Data Mining Landscape and Future Directions,” IEEE Access, vol. 11, pp. 10295479, 2023, doi: 10.1109/ACCESS.2023.10295479.

J. Xia, Y. Zhang, J. Song, Y. Chen, Y. Wang, and S. Liu, "Revisiting dimensionality reduction techniques for visual cluster analysis: An empirical study," IEEE Trans. Vis. Comput. Graphics, vol. PP, no. 1, pp. 1–1, 2021, doi: 10.1109/TVCG.2021.3114694.

M. A. Shahiri, W. Husain, dan N. A. Rashid, “Educational Data Mining: Prediction of Students' Academic Performance Using Machine Learning Algorithms,” Smart Learning Environments, vol. 9, no. 1, p. 192, 2022, doi: 10.1186/s40561-022-00192-z.

M. Allaoui, M. L. Kherfi, and A. Cheriet, "Considerably improving clustering algorithms using UMAP dimensionality reduction technique: A comparative study," Image and Signal Processing, vol. 12119, pp. 317–325, 2020, doi: 10.1007/978-3-030-51935-3_34.

W. Liu, X. Liao, Y. Yang, H. Lin, J. Yeong, X. Zhou, X. Shi, dan J. Liu, “Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data,” Nucleic Acids Research, vol. 50, pp. e72–e72, 2021, doi: 10.1093/nar/gkac219.

J. Xia, Y. Zhang, J. Song, Y. Chen, Y. Wang, and S. Liu, "Revisiting dimensionality reduction techniques for visual cluster analysis: An empirical study," IEEE Trans. Vis. Comput. Graphics, vol. PP, no. 1, pp. 1–1, 2021, doi: 10.1109/TVCG.2021.3114694.

X. Chen, Q. Wang, dan S. Zhuang, “Ensemble dimension reduction based on spectral disturbance for subspace clustering,” Knowledge-Based Systems, vol. 227, p. 107182, 2021, doi: 10.1016/J.KNOSYS.2021.107182.

K. Deng dan X. Zhang, “Tensor envelope mixture model for simultaneous clustering and multiway dimension reduction,” Biometrics, vol. 78, pp. 1067–1079, 2021, doi: 10.1111/biom.13486.

J. Liu, W. He, Y. Wang, and B. Zhang, "Evaluation of dimensionality reduction and unsupervised clustering methods in breast datasets," Applied and Computational Engineering, vol. 31, pp. 153–167, 2024, doi: 10.54254/2755-2721/31/20230153.

A. Markos, O. Moschidis, dan T. Chatzipantelis, “Sequential dimension reduction and clustering of mixed-type data,” International Journal of Data Analysis Techniques and Strategies, vol. 12,

pp. 228–246, 2020, doi: 10.1504/IJDATS.2020.10028842.

F. Hui dan L. Nghiem, “Sufficient dimension reduction for clustered data via finite mixture modelling,” Australian & New Zealand Journal of Statistics, vol. 64, 2022, doi: 10.1111/anzs.12349.

S. Ayesha, M. Hanif, dan R. Talib, “Overview and comparative study of dimensionality reduction techniques for high dimensional data,” Information Fusion, vol. 59, pp. 44–58, 2020, doi: 10.1016/j.inffus.2020.01.005.

M. S. H. Bhuiyan, N. A. Raian, S. I. Leon, dan M. Khan, “Study of Influence of Dimension Reduction of High Dimensional Datasets in Classification Problem,” Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC),

pp. 147–151, 2020, doi: 10.1109/ICCMC48092.2020.ICCMC-00030.

P. Chhikara, N. Jain, R. Tekchandani, dan N. Kumar, “Data dimensionality reduction techniques for Industry 4.0: Research results, challenges, and future research directions,” Software: Practice and Experience, vol. 52, pp. 658–688, 2020, doi: 10.1002/spe.2876.

M. Allaoui, M. L. Kherfi, dan A. Cheriet, “Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study,” Image and Signal Processing, vol. 12119, pp. 317–325, 2020, doi: 10.1007/978-3-030-51935-3_34.

J. Xia et al., "Revisiting Dimensionality Reduction Techniques for Visual Cluster Analysis: An Empirical Study," IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1,

pp. 464–474, Jan. 2022, doi: 10.1109/TVCG.2021.3114694.

S. Nanga et al., "Review of Dimension Reduction Methods," Journal of Data Analysis and Information Processing, vol. 9, no. 3, pp. 189–231, Aug. 2021, doi: 10.4236/jdaip.2021.93013.

S. Mehrotra, "Dimension Reduction via Supervised Clustering of Regression Coefficients: A Review," arXiv preprint arXiv:2202.08722, Feb. 2022. Available: https://arxiv.org/abs/2202.08722

B. Ghojogh et al., "Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey," arXiv preprint arXiv:2106.02154, Jun. 2021. Available: https://arxiv.org/abs/2106.02154

H. Van Assel et al., "Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein," arXiv preprint arXiv:2402.02239, Feb. 2024. Available: https://arxiv.org/abs/2402.02239

Additional Files

Published

2025-04-26

How to Cite

[1]
E. . Priyanto, B. Berlilana, and I. Tahyudin, “Enhancing Clustering Performance through Benchmarking of Dimensionality Reduction Techniques on Educational Data”, J. Tek. Inform. (JUTIF), vol. 6, no. 2, pp. 641–654, Apr. 2025.