Evaluating the Impact of Model Complexity on the Accuracy of ID3 and Modified ID3: A Case Study of the Max_Depth Parameter

Asrianda Asrianda; Herman  Mawengkang; Poltak  Sihombing; Mahyuddin  K. M. Nasution

doi:10.52436/1.jutif.2025.6.5.4864

Authors

Asrianda Informatics, Engineering Faculty, Universitas Malikussaleh, Indonesia
Herman Mawengkang Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia
Poltak Sihombing Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia
Mahyuddin K. M. Nasution Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2025.6.5.4864

Keywords:

accuracy, classical ID3, decision tree, modified ID3, overfitting, performance, tree depth

Abstract

The complexity of decision tree structures has a direct impact on the generalization capability of classification algorithms. This study investigates and evaluates the performance of the classical ID3 algorithm and its modified version in the context of tree depth. The primary objective is to identify the optimal accuracy point and assess the algorithms' robustness against overfitting. Experiments were conducted across tree depths ranging from 1 to 20, with accuracy used as the main evaluation metric. The results indicate that both algorithms achieved peak performance at depth 3, followed by a notable decline. While the classical ID3 algorithm exhibited a gradual decrease in accuracy, the modified ID3 showed a sharp drop and performance stagnation between depths 11 and 20. These findings suggest that the modified ID3 algorithm enhances sensitivity in selecting informative attributes but also increases the risk of overfitting in the absence of structural regularization mechanisms. Therefore, the study recommends the implementation of regularization strategies such as pruning and cross-validation to mitigate performance degradation caused by model complexity. This research not only contributes to the theoretical understanding of how tree depth influences classification performance but also offers practical insights for developing adaptive, stable, and accurate decision tree-based classification systems.

Downloads

Download data is not yet available.

References

Y. Manzali, M. El Far, M. Chahhou, and M. Elmohajir, “Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation,” Cybernetics and Information Technologies, vol. 22, no. 2, pp. 50–65, 2022, doi: 10.2478/cait-2022-0016.

M. Jaworski, P. Duda, and L. Rutkowski, “New Splitting Criteria for Decision Trees in Stationary Data Streams,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 6, pp. 2516–2529, 2018, doi: 10.1109/TNNLS.2017.2698204.

R. Rivera-Lopez and J. Canul-Reich, “Construction of near-optimal axis-parallel decision trees using a differential-evolution-based approach,” IEEE Access, vol. 6, pp. 5548–5563, 2017, doi: 10.1109/ACCESS.2017.2788700.

D. Saraswathi and A. Vijaya, “Search Engine Spam Detection using an Integrated Hybrid Genetic Algorithm based Decision Tree,” Int J Comput Appl, vol. 133, no. 10, pp. 20–27, 2016, doi: 10.5120/ijca2016908027.

A. Shanbhag, S. Vincent, S. B. B. Gowda, O. P. Kumar, and S. A. J. Francis, “Leveraging Metaheuristics for Feature Selection with Machine Learning Classification for Malicious Packet Detection in Computer Networks,” IEEE Access, vol. 12, no. February, pp. 21745–21764, 2024, doi: 10.1109/ACCESS.2024.3362246.

A. M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s Performance Using Data Mining Techniques,” Procedia Comput Sci, vol. 72, pp. 414–422, 2015, doi: 10.1016/j.procs.2015.12.157.

F. Umar and N. Ussiph, “Appraisal of the Classification Technique in Data Mining of Student Performance using J48 Decision Tree, K-Nearest Neighbor and Multilayer Perceptron Algorithms,” Int J Comput Appl, vol. 179, no. 33, pp. 39–46, 2018, doi: 10.5120/ijca2018916751.

W. Lee et al., “Preoperative data-based deep learning model for predicting postoperative survival in pancreatic cancer patients,” International Journal of Surgery, vol. 105, no. August, p. 106851, 2022, doi: 10.1016/j.ijsu.2022.106851.

I. I. Sinam and A. Lawan, “An improved C4.5 model classification algorithm based on Taylor’s series,” Jordanian Journal of Computers and Information Technology, vol. 5, no. 1, pp. 34–42, 2019, doi: 10.5455/jjcit.71-1546551963.

C. Qiu, L. Jiang, and G. Kong, “A Differential Evolution-Based Method for Class-Imbalanced Cost-Sensitive Learning,” pp. 1–8, 2015.

D. P. Rangasamy, S. Rajappan, A. Natarajan, R. Ramasamy, and D. Vijayakumar, “Variable population-sized particle swarm optimization for highly imbalanced dataset classification,” Comput Intell, vol. 37, no. 2, pp. 913–930, 2021, doi: 10.1111/coin.12436.

Y. Cong, J. Liu, B. Fan, P. Zeng, H. Yu, and J. Luo, “Online Similarity Learning for Big Data with Overfitting,” IEEE Trans Big Data, vol. 4, no. 1, pp. 78–89, 2017, doi: 10.1109/tbdata.2017.2688360.

Asrianda, H. Mawengkang, P. Sihombing, and M. K. M. Nasution, “OPTIMIZATION OF MARKETING CAMPAIGNS USING A MODIFIED ID3 DECISION TREE ALGORITHM,” Eastern-European Journal of Enterprise Technologies, vol. 13, no. 2, pp. 58–70, 2025, doi: 10.15587/1729-4061.2025.327158.

J. Yan, Z. Zhang, L. Xie, and Z. Zhu, “A unified framework for decision tree on continuous attributes,” IEEE Access, vol. 7, pp. 11924–11933, 2019, doi: 10.1109/ACCESS.2019.2892083.

S. Raghuwanshi and R. Ahirwal, “An Efficient Classification based Fuzzy Rough Set Theory using ID3 Algorithm,” Int J Comput Appl, vol. 154, no. 1, pp. 31–34, 2016, doi: 10.5120/ijca2016912025.

P. Rakshit, A. Ghosh, C. Chakraborty, J. Paul, and D. Das, “Skin Cancer Detection Using Deep Learning,” Lecture Notes in Electrical Engineering, vol. 1243 LNEE, pp. 359–372, 2025, doi: 10.1007/978-981-97-6465-5_29.

J. Demšar and B. Zupan, “Hands-on training about overfitting,” PLoS Comput Biol, vol. 17, no. 3, pp. 1–19, 2021, doi: 10.1371/journal.pcbi.1008671.

D. Christianti, S. Abdullah, and S. Nurrohmah, “Bayes Risk Post-Pruning in Decision Tree to Overcome Overfitting Problem on Customer Churn Classification,” 2020, doi: 10.4108/eai.2-8-2019.2290487.

B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, 2016, doi: 10.1007/s13748-016-0094-0.

Y. Wang, Y. Li, Y. Song, X. Rong, and S. Zhang, “Improvement of ID3 Algorithm Based on Simplified Information Entropy and Coordination Degree,” Algorithms, vol. 10, no. 4, pp. 1–18, 2017, doi: 10.3390/a10040124.

M. Hlosta, Z. Zdrahal, and J. Zendulka, “Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data,” Knowl Based Syst, vol. 160, no. June, pp. 278–295, 2018, doi: 10.1016/j.knosys.2018.07.021.

J. R. Quinlan, “Induction of decision trees,” Mach Learn, vol. 1, no. 1, pp. 81–106, 1986, doi: 10.1007/bf00116251.

A. Cornea, “An identity theorem for logarithmic potentials,” Osaka Journal of Mathematics, vol. 28, no. 4, pp. 829–836, 1991.

G. Maksa, “The stability of the entropy of degree alpha,” J Math Anal Appl, vol. 346, no. 1, pp. 17–21, 2008, doi: 10.1016/j.jmaa.2008.05.034.

V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies, ICCTCT 2018, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.

H. T. K. Le, A. L. Carrel, and H. Shah, “Impacts of online shopping on travel demand: a systematic review,” Transp Rev, vol. 42, no. 3, pp. 273–295, 2022, doi: 10.1080/01441647.2021.1961917.

C. Panico and C. Cennamo, “User preferences and strategic interactions in platform ecosystems,” Strategic Management Journal, vol. 43, no. 3, pp. 507–529, 2022, doi: 10.1002/smj.3149.

N. Aslam et al., “Anomaly Detection Using Explainable Random Forest for the Prediction of Undesirable Events in Oil Wells,” Applied Computational Intelligence and Soft Computing, vol. 2022, 2022, doi: 10.1155/2022/1558381.

A. Mumuni and F. Mumuni, “Data augmentation: A comprehensive survey of modern approaches,” Array, vol. 16, no. August, p. 100258, 2022, doi: 10.1016/j.array.2022.100258.