Optimizing YOLO11 for Dense Crowd Counting under Severe Occlusion via Head-Detection Fine-Tuning

Joko Sutrisno; Sri  Winarno; Affandy Affandy

doi:10.52436/1.jutif.2026.7.2.5699

Authors

Joko Sutrisno Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia
Sri Winarno Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia
Affandy Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.2.5699

Keywords:

Crowd counting, fine-tuning, head detection, visual occlusion, YOLO11

Abstract

Accurate and real-time people counting is essential for crowd management and public safety, yet achieving precision in high-density environments remains a challenge due to severe visual occlusion. While the recently released YOLO11 architecture introduces advanced features such as C3k2 and C2PSA modules, its performance as a pre-trained model for people counting tasks has not been fully explored. This study evaluates the efficacy of a head-detection-based fine-tuning strategy using the YOLO11 model, compared against the default pre-trained baseline. The fine-tuning performance is analyzed across three distinct scenarios: S1 (full fine-tuning at 960 pixels), S2 (partial backbone freezing at 960 pixels), and S3 (partial freezing at 640 pixels). The fine-tuning process was conducted using the CC_Mach_1 dataset from Roboflow Universe, which consists of high-density images annotated for head detection. The results demonstrate that the baseline pre-trained YOLO11, which relies on full-body features, exhibits extremely limited performance with an mAP@0.5 of 0.017 and a Mean Absolute Error (MAE) of 100.3. In contrast, the fine-tuned scenarios achieved substantial improvements, led by S1 which reached the highest accuracy with an mAP@0.5 of 0.682 and reduced the MAE by 62% to 37.8. While S2 remained highly competitive with an MAE of 39.6, the performance in S3 declined to 46.9, confirming that lower input resolutions limit the model's ability to identify small-scale head features. These findings provide empirical evidence that domain-specific fine-tuning for head detection substantially improves the robustness of YOLO11 against occlusion. Beyond technical accuracy, this detection-based approach offers a more computationally efficient alternative to traditional density-map-based methods, making it highly suitable for deployment in real-time surveillance systems for large-scale public monitoring.

Downloads

Download data is not yet available.

References

S. Yi, H. Li, and X. Wang, “Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance,” IEEE Trans. Image Process., vol. 25, no. 9, pp. 4354–4368, Sep. 2016, doi: 10.1109/TIP.2016.2590322.

S. Koswatte, K. McDougall, and X. Liu, “Crowd-Assisted Flood Disaster Management,” 2022, pp. 39–55. doi: 10.1007/978-3-031-14096-9_3.

L. Deng, Q. Zhou, S. Wang, J. M. Górriz, and Y. Zhang, “Deep learning in crowd counting: A survey,” CAAI Trans. Intell. Technol., vol. 9, no. 5, pp. 1043–1077, Oct. 2024, doi: 10.1049/cit2.12241.

Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-Image Crowd Counting via Multi-Column Convolutional Neural Network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2016, pp. 589–597. doi: 10.1109/CVPR.2016.70.

Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2018, pp. 1091–1100. doi: 10.1109/CVPR.2018.00120.

V. A. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognit. Lett., vol. 107, pp. 3–16, May 2018, doi: 10.1016/j.patrec.2017.07.007.

H. F. Elsepae, H. M. El-Hoseny, E. K. I. Hamad, and E.-S. M. El-Rabaie, “Deep learning for crowd counting in complex environments: challenges and novel trends,” Discov. Comput., vol. 29, no. 1, p. 101, Feb. 2026, doi: 10.1007/s10791-026-09928-8.

M. Wang, X. Zhou, and Y. Chen, “A comprehensive survey of crowd density estimation and counting,” IET Image Process., vol. 19, no. 1, Jan. 2025, doi: 10.1049/ipr2.13328.

G. Gao, J. Gao, Q. Liu, Q. Wang, and Y. Wang, “A survey of deep learning methods for density estimation and crowd counting,” Vicinagearth, vol. 2, no. 1, p. 2, Feb. 2025, doi: 10.1007/s44336-024-00011-8.

Z. Fan, H. Zhang, Z. Zhang, G. Lu, Y. Zhang, and Y. Wang, “A survey of crowd counting and density estimation based on convolutional neural network,” Neurocomputing, vol. 472, pp. 224–251, Feb. 2022, doi: 10.1016/j.neucom.2021.02.103.

A. Nugroho, F. Indaryanto, and A. F. Suni, “Distance and people counting app based on YOLO as a Covid-19 health protocol,” 2023, p. 040024. doi: 10.1063/5.0141518.

P. Ren, L. Wang, W. Fang, S. Song, and S. Djahel, “A novel squeeze YOLO-based real-time people counting approach,” Int. J. Bio-Inspired Comput., vol. 16, no. 2, p. 94, 2020, doi: 10.1504/IJBIC.2020.109674.

A. N. Alhawsawi, S. D. Khan, and F. U. Rehman, “Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery,” Remote Sens., vol. 16, no. 22, p. 4175, Nov. 2024, doi: 10.3390/rs16224175.

W. Farhat, O. Ben Rhaiem, H. Faiedh, and C. Souani, “Pedestrian detection and tracking using an enhanced YOLOv9 model for automotive vehicles,” Measurement, vol. 254, p. 118009, Oct. 2025, doi: 10.1016/j.measurement.2025.118009.

L. Wu, X. Li, P. Ma, and Y. Cai, “Research on a Dense Pedestrian-Detection Algorithm Based on an Improved YOLO11,” Futur. Internet, vol. 17, no. 10, 2025, doi: 10.3390/fi17100438.

M. L. Ali and Z. Zhang, “The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection,” Computers, vol. 13, no. 12, 2024, doi: 10.3390/computers13120336.

X. Chu, A. Zheng, X. Zhang, and J. Sun, “Detection in Crowded Scenes: One Proposal, Multiple Predictions,” Jun. 2020, [Online]. Available: http://arxiv.org/abs/2003.09163

N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS — Improving Object Detection with One Line of Code,” in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Oct. 2017, pp. 5562–5570. doi: 10.1109/ICCV.2017.593.

A. A. Murat and M. S. Kiran, “A comprehensive review on YOLO versions for object detection,” Eng. Sci. Technol. an Int. J., vol. 70, p. 102161, Oct. 2025, doi: 10.1016/j.jestch.2025.102161.

L. He, Y. Zhou, L. Liu, W. Cao, and J. Ma, “Research on object detection and recognition in remote sensing images based on YOLOv11,” Sci. Rep., vol. 15, no. 1, p. 14032, Apr. 2025, doi: 10.1038/s41598-025-96314-x.

R. Khanam and M. Hussain, “YOLOv11: An Overview of the Key Architectural Enhancements,” vol. 2024, pp. 1–9, 2024, [Online]. Available: http://arxiv.org/abs/2410.17725

B. Zhan, D. N. Monekosso, P. Remagnino, S. A. Velastin, and L.-Q. Xu, “Crowd analysis: a survey,” Mach. Vis. Appl., vol. 19, no. 5–6, pp. 345–357, Oct. 2008, doi: 10.1007/s00138-008-0132-4.

Z. Xu, H. Zhao, P. Liu, L. Wang, G. Zhang, and Y. Chai, “SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries,” Remote Sens., vol. 17, no. 20, p. 3414, Oct. 2025, doi: 10.3390/rs17203414.

X. Gong, J. Yu, H. Zhang, and X. Dong, “AED-YOLO11: A small object detection model based on YOLO11,” Digit. Signal Process., vol. 166, p. 105411, Nov. 2025, doi: 10.1016/j.dsp.2025.105411.

M. A. Ali, A. J. Hussain, and A. T. Sadiq, “Detection And Count of Human Bodies In a Crowd Scene Based on Enhancement Features By Using The YOLO v5 Algorithm,” Iraqi J. Comput. Commun. Control Syst. Eng., pp. 125–134, Jun. 2022, doi: 10.33103/uot.ijccce.22.2.11.

M. Hassan, F. Hussain, S. D. Khan, M. Ullah, M. Yamin, and H. Ullah, “Crowd counting using deep learning based head detection,” Electron. Imaging, vol. 35, no. 9, pp. 293--1-293–6, Jan. 2023, doi: 10.2352/EI.2023.35.9.IPAS-293.

M. Abubaker, Z. Alsadder, H. Abdelhaq, M. Boltes, and A. Alia, “RPEE-Heads Benchmark: A Dataset and Empirical Comparison of Deep Learning Algorithms for Pedestrian Head Detection in Crowds,” IEEE Access, vol. 13, no. April, pp. 73451–73467, 2025, doi: 10.1109/ACCESS.2025.3563311.

R. V Vadavadagi, S. E. N, A. Marlinganavvar, A. Hurkadli, K. Bhoomraddi, and U. Kulkarni, “Head Counting in Crowded Scenes Using YOLOv10: A Deep Learning Approach,” in Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) -, 2025, pp. 611–618.

M. H. K. Khel et al., “Realtime Crowd Monitoring—Estimating Count, Speed and Direction of People Using Hybridized YOLOv4,” IEEE Access, vol. 11, pp. 56368–56379, 2023, doi: 10.1109/ACCESS.2023.3272481.

F. Zhuang et al., “A Comprehensive Survey on Transfer Learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555.

D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.16061

M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, pp. 427–437, Jul. 2009, doi: 10.1016/j.ipm.2009.03.002.

R. Padilla, S. L. Netto, and E. A. B. da Silva, “A Survey on Performance Metrics for Object-Detection Algorithms,” in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), IEEE, Jul. 2020, pp. 237–242. doi: 10.1109/IWSSIP48289.2020.9145130.

G. Jocher and J. Qiu, “Ultralytics YOLO11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

F. Ciaglia, F. S. Zuppichini, P. Guerrie, M. McQuade, and J. Solawetz, “Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark,” Nov. 2022, [Online]. Available: http://arxiv.org/abs/2211.13523

G. A. Capstone, “CC_Mach_1 Dataset,” Jul. 2023, Roboflow. [Online]. Available: https://universe.roboflow.com/ga-capstone-3f9vu/cc_mach_1

J. Su, F. Wang, and W. Zhuang, “An Improved YOLOv7 Tiny Algorithm for Vehicle and Pedestrian Detection with Occlusion in Autonomous Driving,” Chinese J. Electron., vol. 34, no. 1, pp. 282–294, Jan. 2025, doi: 10.23919/cje.2023.00.256.

N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE International Conference on Image Processing (ICIP), IEEE, Sep. 2017, pp. 3645–3649. doi: 10.1109/ICIP.2017.8296962.

D. Helbing and A. Johansson, “Pedestrian, Crowd and Evacuation Dynamics,” in Extreme Environmental Events, New York, NY: Springer New York, 2011, pp. 697–716. doi: 10.1007/978-1-4419-7695-6_37.

Z. Asadi-Shekari, M. Moeinaddini, and M. Zaly Shah, “Pedestrian safety index for evaluating street facilities in urban areas,” Saf. Sci., vol. 74, pp. 1–14, Apr. 2015, doi: 10.1016/j.ssci.2014.11.014.

Optimizing YOLO11 for Dense Crowd Counting under Severe Occlusion via Head-Detection Fine-Tuning

Authors

DOI:

Keywords:

Abstract

Downloads

References

Additional Files

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

sidebar

Information