Interpretable Machine Learning for Employee Recruitment Prediction Using Boruta, CatBoost, Lasso, Logistic Regression, NLP, and RFE Feature Selection

Authors

  • Aswan Supriyadi Sunge Informatics Engineering Department, Faculty of Engineering, Pelita Bangsa University, Indonesia
  • Suzanna Informatics Information Systems Department, Binus Online Learning, Bina Nusantara University, Indonesia
  • Hamzah Muhammad Mardi Putra Management Department, Faculty of Economics and Business, Pelita Bangsa University, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.4.4810

Keywords:

Employee Selection, Feature Selection, Interpretable Models, Recruitment Prediction

Abstract

Employee recruitment is one of the crucial processes in human resource management that has a direct impact on the performance and success of the company. In the digital era, the use of Machine Learning (ML) in candidate selection processes is increasingly prevalent due to its ability to enhance efficiency, accuracy, and transparency. This research is important because conventional recruitment methods often face issues such as subjective bias, slow processing times, and limitations in assessing a candidate’s true potential. ML offers a more objective, data-driven, and faster approach, enabling companies to identify the best candidates more effectively. This study aims to identify the main features that influence recruitment decisions, as well as evaluate the effectiveness and interpretability of several ML models, namely Boruta, CatBoost, Lasso Regression, Logistic Regression, Natural Language Processing (NLP), and Recursive Feature Elimination (RFE). This study uses a dataset consisting of 1,501 samples with 10 features and one class variable (0 = Not Hired, 1 = Hired). The evaluation is carried out based on the ability of each model to identify the features that make the most significant contribution to the classification results. This study has several limitations, particularly the potential bias in the data, such as demographic bias that may be reflected in historical recruitment decisions. This could lead the ML models to replicate or even reinforce such biases. Additionally, the limited dataset size may affect the models' ability to generalize to new data. In the context of this study, the main parameter used to assess the superiority of the model is the most dominant feature or the highest feature produced by each method. The test results show that the Boruta model identifies Gender as the most influential feature, while the CatBoost, Lasso Regression, Logistic Regression, and NLP models consistently place Recruitment Strategy as the most significant feature in predicting candidate eligibility. Meanwhile, the RFE model produces Distance from the Company as the highest feature that influences recruitment decisions. The uniqueness of this study lies in its approach that integrates feature interpretability models within the real-world context of recruitment decision-making. This approach not only emphasizes prediction accuracy but also promotes transparency and a clear understanding of the rationale behind each decision. It supports the development of a fairer and more accountable selection process, particularly by minimizing unconscious bias in data-driven recruitment systems. From a practical standpoint, the findings are highly relevant for human resource professionals, as the identified key features can be used to design more objective selection strategies and enhance the efficiency of candidate evaluations. Therefore, this study makes a tangible contribution to the advancement of modern, technology-based recruitment systems that prioritize fairness and decision-making efficiency. Additionally, the selection of evaluation metrics could be further elaborated to strengthen the analysis, for example by presenting the overall accuracy of each model or comparing them with alternative approaches to provide a more comprehensive view of the models' performance.

Downloads

Download data is not yet available.

References

G. D. Byrd and X. C. Simcock, "Human Resources Management: From Recruitment to Retention to Pitfalls," Hand Clinics, vol. 40, no. 4, pp. 467-476, 2024, doi: 10.1016/j.hcl.2024.06.002

A. Mohammad, "A review of recruitment and selection process," Journal of Business and Management, vol. 22, no. 5, pp. 28-34, 2020, doi: 10.9790/487X-2205012834.

L. T. V. Ha, P. N. Linh, D. D. Thanh, T.-H. Nguyen, D. V. Nguyen, L.-A. T. Nguyen, and P.-H. Nguyen, "The impact of corporate vision, customer orientation, and core values with experience as a moderator – insights from Vietnamese enterprises," Journal of Open Innovation: Technology, Market, and Complexity., vol. 11, no. 1, p. 100460, 2025, doi: 10.1016/j.joitmc.2024.100460.

D. M. Truxillo, T. N. Bauer, and B. Erdogan, "Selection and recruitment: An organizational perspective," Organizational Psychology Review, vol. 12, no. 3, pp. 199-225, 2022, doi: 10.1177/20413866211005417.

I. Farida and D. Setiawan, "Business strategies and competitive advantage: The role of performance and innovation," Journal of Open Innovation: Technology, Market, and Complexity., vol. 8, no. 3, p. 163, 2022, doi: 10.3390/joitmc8030163.

M. Rožman, P. Tominc, and T. Štrukelj, “Competitiveness through development of strategic talent management and agile management ecosystems,” Global Journal of Flexible Systems Management, vol. 24, no. 3, pp. 373–393, Jun. 2023, doi: 10.1007/s40171-023-00344-1.

F. L. Schmidt and J. E. Hunter, "The impact of selection methods on organizational success: A meta-analytic review," Journal of Applied Psychology, vol. 106, no. 3, pp. 450-463, 2021. doi: 10.1037/apl0000846.

D. Sam, M. Ganesan, S. Ilavarasan and T. J. Victor, "Hiring and Recruitment Process Using Machine Learning," 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), Chennai, India, 2023, pp. 1-4, doi: 10.1109/ICECONF57129.2023.10084133.

D. N.H.A.S, W. E.J.K.D, C. S.M.A, W. K.W.M, S. Thelijjagoda and N. Giguruwa, "AI Bot to Increase the Accuracy and Efficiency of Hiring Process of Business Organizations," 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 2024, pp. 1-6, doi: 10.1109/ICSES63760.2024.10910737.

B. Meraliyev, B. Alibekova and I. Bekturganova, "Machine Learning as an effective tool for Human Resource management in recruiting process in the higher educational field," 2023 17th International Conference on Electronics Computer and Computation (ICECCO), Kaskelen, Kazakhstan, 2023, pp. 1-5, doi: 10.1109/ICECCO58239.2023.10147133.

P. B, S. Fahimuddin, A. H. S, H. B, K. P and L. A. A, "Advanced Recruitment Strategies in Business Intelligence Systems: A Comparative Study of Machine Learning Models," 2025 8th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2025, pp. 1628-1631, doi: 10.1109/ICOEI65986.2025.11013060.

R. Dugyala, V. K. Gaddam, H. Eroju, M. V. Dantuluri and M. Ch, "Smart Recruitment System," 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 2024, pp. 1-7, doi: 10.1109/ICCCNT61001.2024.10725202.

V. S. Pendyala, N. Atrey, T. Aggarwal and S. Goyal, "Enhanced Algorithmic Job Matching based on a Comprehensive Candidate Profile using NLP and Machine Learning," 2022 IEEE Eighth International Conference on Big Data Computing Service and Applications (BigDataService), Newark, CA, USA, 2022, pp. 183-184, doi: 10.1109/BigDataService55688.2022.00040.

N. D. Dogiparthy, R. D and V. S. K. Devi, "Optimizing Hiring Practices: A Machine Learning Approach for Candidate Selection," 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 2025, pp. 1498-1504, doi: 10.1109/IDCIOT64235.2025.10914989.

J. Brito, J. Ferro, D. Costa, E. Costa, R. Lopes and J. Fechine, "A ranking between attributes selection models using data from NCAA Basketball players to determine their tendency to reach the NBA," 2023 18th Iberian Conference on Information Systems and Technologies (CISTI), Aveiro, Portugal, 2023, pp. 1-6, doi: 10.23919/CISTI58278.2023.10211486.

D. Jagan Mohan Reddy, S. Regella and S. R. Seelam, "Recruitment Prediction using Machine Learning," 2020 5th International Conference on Computing, Communication and Security (ICCCS), Patna, India, 2020, pp. 1-4, doi: 10.1109/ICCCS49678.2020.9276955.

R. Khurana, M. Yadav, M. Quttainah, A. P. Srivastava, A. Balodi and P. K. Singh, "Neural Networks in Recruitment: Trends and Future Directions," 2023 International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics (AIKIIE), Ballari, India, 2023, pp. 1-5, doi: 10.1109/AIKIIE60097.2023.10390017.

P. N. Mwaro, K. Ogada and W. Cheruiyot, "Neural Network Model for Talent Recruitment and Management for Employee Development and Retention," 2021 IEEE AFRICON, Arusha, Tanzania, United Republic of, 2021, pp. 1-6, doi: 10.1109/AFRICON51333.2021.9571014.

C. Qin et al., "Towards Automatic Job Description Generation With Capability-Aware Neural Networks," in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 5, pp. 5341-5355, 1 May 2023, doi: 10.1109/TKDE.2022.3145396.

M. K. Shaw, P. Dey, S. Chowdhury and T. Ghosh, "Job Candidate Eligibility Prediction using Convolutional Neural Network," 2024 4th International Conference on Intelligent Technologies (CONIT), Bangalore, India, 2024, pp. 1-6, doi: 10.1109/CONIT61985.2024.10625937.

M. Arboleda, C. Vieira and J. L. Chiu, "Opening the Machine Learning Black Box for Multidisciplinary Students: Scaffolding from GUI to Coding," 2023 IEEE Frontiers in Education Conference (FIE), College Station, TX, USA, 2023, pp. 1-5, doi: 10.1109/FIE58773.2023.10343043.

N. Khan, M. Nauman, A. S. Almadhor, N. Akhtar, A. Alghuried and A. Alhudhaif, "Guaranteeing Correctness in Black-Box Machine Learning: A Fusion of Explainable AI and Formal Methods for Healthcare Decision-Making," in IEEE Access, vol. 12, pp. 90299-90316, 2024, doi: 10.1109/ACCESS.2024.3420415.

M. R. Karim et al., "Interpreting Black-box Machine Learning Models for High Dimensional Datasets," 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), Thessaloniki, Greece, 2023, pp. 1-10, doi: 10.1109/DSAA60987.2023.10302562.

S. Bala and K. Arora, "Interpretable Investigation of Feature Relevance and Sparsity of IoT Datasets," 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Goathgaun, Nepal, 2025, pp. 374-379, doi: 10.1109/ICMCSI64620.2025.10883378.

T. R. N and R. Gupta, "Feature Selection Techniques and its Importance in Machine Learning: A Survey," 2020 IEEE International Students' Conference on Electrical,Electronics and Computer Science (SCEECS), Bhopal, India, 2020, pp. 1-6, doi: 10.1109/SCEECS48394.2020.189.

K. Liu, Y. Fu, L. Wu, X. Li, C. Aggarwal and H. Xiong, "Automated Feature Selection: A Reinforcement Learning Perspective," in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 3, pp. 2272-2284, 1 March 2023, doi: 10.1109/TKDE.2021.3115477.

A. Pandit, A. Gupta, M. Bhatia, and S. C. Gupta, "Filter based feature selection anticipation of automobile price prediction in Azure Machine Learning," in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), 2022, pp. 256–262. doi: 10.1109/COM-IT-CON54601.2022.9850615.

L. Quan, T. Gong, and K. Jiang, "Denying Evolution Resampling: An Improved Method for Feature Selection on Imbalanced Data," Electronics, vol. 12, no. 15, p. 3212, 2023. doi: 10.3390/electronics12153212.

A. M. Dallo and A. J. Humaidi, "Optimizing machine learning models with data-level approximate computing: The role of diverse sampling, precision scaling, quantization, and feature selection strategies," Results in Engineering, vol. 24, p. 103451, 2024. doi: 10.1016/j.rineng.2024.103451.

H. M. Farghaly and T. Abd El-Hafeez, "A high-quality feature selection method based on frequent and correlated items for text classification," Soft Computing, vol. 27, pp. 11259–11274, 2023. doi: 10.1007/s00500-023-08587-x.

D. Theng and K. K. Bhoyar, "Feature selection techniques for machine learning: A survey of more than two decades of research," Knowledge and Information Systems, vol. 66, pp. 1575-1637, 2024. doi: 10.1007/s10115-023-02010-5.

M. H. Khan and J. Jin, "The relationship between ethnocentric behaviour and workforce localisation success: The mediating role of knowledge sharing tendency," European Research on Management and Business Economics, vol. 30, no. 2, p. 100245, 2024. doi: 10.1016/j.iedeen.2024.100245.

G. Manikandan, B. Pragadeesh, V. Manojkumar, A. L. Karthikeyan, R. Manikandan, and A. H. Gandomi, "Classification models combined with Boruta feature selection for heart disease prediction," Informatics in Medicine Unlocked, vol. 44, p. 101442, 2024. doi: 10.1016/j.imu.2024.101442.

H. Zhou, Y. Xin, and S. Li, "A diabetes prediction model based on Boruta feature selection and ensemble learning," BMC Bioinformatics, vol. 24, p. 224, 2023. doi: 10.1186/s12859-023-05300-5.

E. Mylona, D. I. Zaridis, C. N. Kalantzopoulos, et al., "Optimizing radiomics for prostate cancer diagnosis: Feature selection strategies, machine learning classifiers, and MRI sequences," Insights Imaging, vol. 15, p. 265, 2024. doi: 10.1186/s13244-024-01783-9.

J. Li, Y. Liu, H. Gong, and X. Huang, "Stock price series forecasting using multi-scale modeling with Boruta feature selection and adaptive denoising," Applied Soft Computing, vol. 154, p. 111365, 2024. doi: 10.1016/j.asoc.2024.111365.

J. Dhar and S. Roy, "Identification and diagnosis of cervical cancer using a hybrid feature selection approach with the Bayesian optimization-based optimized CatBoost classification algorithm," Journal of Ambient Intelligence and Humanized Computing, vol. 15, pp. 3459–3477, 2024. doi: 10.1007/s12652-024-04825-8.

Y. Zhou, S. Wang, Y. Xie, J. Zeng, and C. Fernandez, "Remaining useful life prediction and state of health diagnosis of lithium-ion batteries with multiscale health features based on optimized CatBoost algorithm," Energy, vol. 300, p. 131575, 2024. doi: 10.1016/j.energy.2024.131575.

X. Huang, W. Liu, Q. Guo, and J. Tan, "Prediction method for the dynamic response of expressway lateritic soil subgrades on the basis of Bayesian optimization CatBoost," Soil Dynamics and Earthquake Engineering, vol. 186, p. 108943, 2024. doi: 10.1016/j.soildyn.2024.108943.

Guenther G. Pavankumar, J. Velmurugan and S. Padmakala, "Real-Time Bitcoin Cost Identification to Improve Efficiency Using Lasso Regression in Comparison with Decision Tree," 2024 IEEE Wireless Antenna and Microwave Symposium (WAMS), Visakhapatnam, India, 2024, pp. 1-5, doi: 10.1109/WAMS59642.2024.10528033.

X. Li, Z. Zhang, and L. Li, "Combining feature selection and classification using LASSO-based MCO classifier for credit risk evaluation," Computational Economics, vol. 64, pp. 2641-2662, 2024. doi: 10.1007/s10614-023-10535-8.

C. Ai, "A method for cancer genomics feature selection based on LASSO-RFE," Iranian Journal of Science and Technology, Transactions of Science, vol. 46, pp. 731–738, 2022. doi: 10.1007/s40995-022-01292-8.

C. H. Feng, M. L. Disis, C. Cheng, and L. Zhang, "Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models," Laboratory Investigation, vol. 102, no. 3, pp. 236-244, 2022. doi: 10.1038/s41374-021-00662-x.

Z. Khandezamin, M. Naderan, and M. J. Rashti, "Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier," Journal of Biomedical Informatics, vol. 111, p. 103591, 2020. doi: 10.1016/j.jbi.2020.103591.

F. Deng, L. Zhao, N. Yu, Y. Lin, and L. Zhang, "Union with recursive feature elimination: A feature selection framework to improve the classification performance of multicategory causes of death in colorectal cancer," Laboratory Investigation, vol. 104, no. 3, p. 100320, 2024. doi: 10.1016/j.labinv.2023.100320.

P. K. Chawla, M. S. Nair, D. G. Malkhede, and S. P. Narwaria, “Parkinson’s disease classification using nature inspired feature selection and recursive feature elimination,” Multimedia Tools and Applications, vol. 83, pp. 35197–35220, 2024, doi: 10.1007/s11042-023-16804-w.

P. Theerthagiri and S. Devarayapattana Siddalingaiah, "RG-SVM: Recursive Gaussian Support Vector Machine based feature selection algorithm for liver disease classification," Multimedia Tools and Applications, vol. 83, pp. 59021-59042, 2024. doi: 10.1007/s11042-023-17825-1.

M. Anand, K. B. Sahay, M. A. Ahmed, D. Sultan, R. R. Chandan, and B. Singh, "Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques," Theoretical Computer Science, vol. 943, pp. 203-218, 2023. doi: 10.1016/j.tcs.2022.06.020.

A. E. Abdellah, H. Ouahi, E. M. Cherrat, and A. Haqiq, “Exploring advanced feature selection techniques: An application to dialectal Arabic data,” International Journal of Information Technology, vol. 16, pp. 4637–4649, 2024, doi: 10.1007/s41870-024-01974-z.

H. Zhou, Y. Xin, and S. Li, "A diabetes prediction model based on Boruta feature selection and ensemble learning," BMC Bioinformatics, vol. 24, Art. no. 224, 2023. doi: 10.1186/s12859-023-05300-5.

M. Al Fatih Abil Fida, T. Ahmad and M. Ntahobari, "Variance Threshold as Early Screening to Boruta Feature Selection for Intrusion Detection System," 2021 13th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 2021, pp. 46-50, doi: 10.1109/ICTS52701.2021.9608852.

Y. Wang, R. Wang, J. Wang, and X. Zhang, “A rock mass strength prediction method integrating wave velocity and operational parameters based on the Bayesian optimization CatBoost algorithm,” KSCE Journal of Civil Engineering, vol. 27, pp. 3148–3162, 2023, doi: 10.1007/s12205-023-2475-9.

Y. Cai, Y. Yuan, and A. Zhou, "A predictive slope stability early warning model based on CatBoost," Scientific Reports, vol. 14, Art. no. 25727, 2024. doi: 10.1038/s41598-024-77058-6.

Y. Zhao, Y. Zhao, H. Liao, S. Pan, and Y. Zheng, "Interpreting LASSO regression model by feature space matching analysis for spatio-temporal correlation-based wind power forecasting," Applied Energy, vol. 380, p. 124954, 2024. doi: 10.1016/j.apenergy.2023.124954.

P. Y. Ng, E. Aruchunan, F. Furuoka, S. A. Abdul Karim, J. V. L. Chew, and M. K. M. Ali, "Intelligent LASSO Regression Modelling for Seaweed Drying Analysis," in Intelligent Systems Modeling and Simulation III, S. A. Abdul Karim, Ed. Cham, Switzerland: Springer, 2024, vol. 553, pp. 103-122. doi: 10.1007/978-3-031-67317-7_8.

J. Nyholm, A. N. Ghazi, S. N. Ghazi, dan J. Sanmartin Berglund, "Prediction of dementia based on older adults’ sleep disturbances using machine learning," Computers in Biology and Medicine, vol. 171, hal. 108126, 2024, doi: 10.1016/j.compbiomed.2024.108126.

C.-C. Huang, W.-Y. Kuo, Y.-T. Shen, C.-J. Chen, H.-J. Lin, C.-C. Hsu, C.-F. Liu, and C.-C. Huang, "Artificial intelligence prediction of in-hospital mortality in patients with dementia: A multi-center study," International Journal of Medical Informatics, vol. 191, p. 105590, 2024, doi: 10.1016/j.ijmedinf.2024.105590.

D. Roman, S. Saxena, V. Robu, "Machine learning pipeline for battery state-of-health estimation," Nature Machine Intelligence., vol. 3, pp. 447–456, 2021, doi: 10.1038/s42256-021-00312-3.

F. Tian, S. Chen, X. Ji, J. Xu, M. Yang, and R. Xiong, "Robust lithium-ion battery state of health estimation based on recursive feature elimination-deep bidirectional long short-term memory model using partial charging data," International Journal of Electrochemical Science., vol. 20, no. 1, p. 100891, 2024, doi: 10.1016/j.ijoes.2024.100891.

N. Ahmed, A. K. Saha, M. A. Al Noman, J. R. Jim, M. F. Mridha, and M. M. Kabir, "Deep learning-based natural language processing in human–agent interaction: Applications, advancements, and challenges," Natural Language Processing Journal., vol. 9, p. 100112, 2024, doi: 10.1016/j.nlp.2024.100112.

M. Levis, J. Levy, M. DiMambro, V. DuFort, D. J. Ludmer, M. Goldberg, and B. Shiner, "Using natural language processing to evaluate temporal patterns in suicide risk variation among high-risk veterans," Psychiatry Research., vol. 339, p. 116097, 2024, doi: 10.1016/j.psychres.2024.116097.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: State of the art, current trends and challenges,” Multimedia Tools and Applications, vol. 82, no.3, pp. 3713–3744, 2023, doi: 10.1007/s11042-022-13428-4.

Y. Tian and N. Cao, "Case Study on the Application of Information Technology in Physical Education Teaching Based on Independent Sample T test," 2023 3rd International Conference on Information Technology and Contemporary Sports (TCS), Guangzhou, China, 2023, pp. 6-10, doi: 10.1109/TCS59553.2023.10455452.

J. Li, "Finite sample t-tests for high-dimensional means," Journal of Multivariate Analysis, vol. 196, p. 105183, 2023, doi: 10.1016/j.jmva.2023.105183.

G. Di Leo and F. Sardanelli, "Statistical significance: p value, 0.05 threshold, and applications to radiomics—reasons for a conservative approach," European Radiology Experimental., vol. 4, p. 18, 2020, doi: 10.1186/s41747-020-0145-y.

Additional Files

Published

2025-08-18

How to Cite

[1]
A. S. . Sunge, S. Suzanna, and H. M. . Mardi Putra, “Interpretable Machine Learning for Employee Recruitment Prediction Using Boruta, CatBoost, Lasso, Logistic Regression, NLP, and RFE Feature Selection”, J. Tek. Inform. (JUTIF), vol. 6, no. 4, pp. 2153–2170, Aug. 2025.