Interpretable and Statistically Validated Comparative Evaluation of EfficientNetB0, MobileNetV2, and ResNet50 for Bold and Natural Makeup Classification on CelebA

Aurelia Chiara  Suryabangun; Abdussalam Abdussalam

doi:10.52436/1.jutif.2026.7.3.5806

Authors

Aurelia Chiara Suryabangun Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia
Abdussalam Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Indonesia

DOI:

https://doi.org/10.52436/1.jutif.2026.7.3.5806

Keywords:

Bold makeup, Cross-validation, EfficientNetB0, Facial makeup classification, MobileNetV2, Natural makeup, ResNet50

Abstract

Facial makeup classificationplays a critical role in beauty technology, visual style analysis, and intelligent web-based image inference. Distinguishing bold makeup from natural makeup is challenging due to subtle visual overlap, borderline facial appearance, and inconsistent makeup intensity across images. While numerous prior studies have applied deep learning for facial analysis, most focus solely on conventional performance metrics without addressing statistical validation, probability calibration, or interpretability — a critical gap that limits reliable model selection in visually subtle classification tasks. This study presents an interpretable and statistically validated comparative evaluation of three transfer learning architectures — EfficientNetB0, MobileNetV2, and ResNet50 — for binary makeup classification using a curated CelebA-based dataset. The final dataset comprises 12,000 facial images equally divided into natural_makeup and bold_makeup classes, with separate training, validation, and clean test subsets. Models were evaluated using holdout testing, 10-fold cross-validation, McNemar statistical testing, calibration analysis, confidence intervals, ROC and PR curves, and Grad-CAM visualization. Experimental results show that EfficientNetB0 achieved the best overall performance, with 0.7900 Accuracy, 0.7898 Macro-F1, 0.8829 ROC-AUC, and 0.8461 PR-AUC on the clean holdout test set. Across ten-fold cross-validation, EfficientNetB0 further achieved 0.7801 ± 0.0093 Accuracy and 0.8780 ± 0.0090 ROC-AUC. It also demonstrated the strongest calibration performance, with the lowest Expected Calibration Error (ECE = 0.0558) and Brier Score (0.1449) among all compared models. The selected model was further implemented in a FastAPI-based backend system for web-based prediction. From a broader Informatics and Computer Science perspective, this study contributes a rigorous and reproducible evaluation framework that integrates statistical validation, calibration assessment, and interpretability, enabling more reliable model selection in visually subtle facial analysis tasks and supporting practical deployment in intelligent systems.

Downloads

Download data is not yet available.

References

D. E. Boukhari, F. Dornaika, N. Barrena, A. Chemsa, and R. Ajgou, "CNN Based Facial Aesthetics Analysis Through Dynamic Robust Losses and Ensemble Regression," Applied Intelligence, vol. 53, no. 9, pp. 10825–10842, 2023. DOI: 10.1007/s10489-022-03943-0

T. K. Hanchinal, V. D. Bhavani, and V. B. Mindolli, "Intelligent Beauty Product Recommendation Using Deep Learning," in Proc. 1st Int. Conf. on Cognitive, Green and Ubiquitous Computing (IC-CGU), IEEE, 2024, pp. 1–5. DOI: 10.1109/IC-CGU58078.2024.10530808

J. Lee, H. Yoon, S. Kim, C. Lee, J. Lee, and S. Yoo, "Deep Learning-Based Skin Care Product Recommendation: A Focus on Cosmetic Ingredient Analysis and Facial Skin Conditions," Journal of Cosmetic Dermatology, vol. 23, no. 6, pp. 2066–2077, 2024. DOI: 10.1111/jocd.16218

S. Ray, A. M, A. K. Rao, S. K. Shukla, S. Gupta, and P. Rawat, "Cosmetics Suggestion System Using Deep Learning," in Proc. 2nd Int. Conf. on Technological Advancements in Computational Sciences (ICTACS), IEEE, 2022, pp. 680–684. DOI: 10.1109/ICTACS56270.2022.9987850

F. Boutros, N. Damer, J. N. Kolf, and A. Kuijper, "Deep Learning Models for Automatic Makeup Detection," AI, vol. 2, no. 4, pp. 477–498, 2021. DOI: 10.3390/ai2040031

Y. Gulzar, "Fruit Image Classification Model Based on MobileNetV2 with Deep Transfer Learning Technique," Sustainability, vol. 15, no. 3, p. 1906, 2023. DOI: 10.3390/su15031906

L. Zhang, Y. Bian, P. Jiang, and F. Zhang, "A Transfer Residual Neural Network Based on ResNet-50 for Detection of Steel Surface Defects," Applied Sciences, vol. 13, no. 9, p. 5260, 2023. DOI: 10.3390/app13095260

M. S. Islam, M. S. Hossain, M. A. Islam, M. A. Hossain, and M. A. Hasan, "An Integrated Deep Learning Model with EfficientNet and ResNet for Accurate Multi-Class Skin Disease Classification," Diagnostics, vol. 15, no. 5, p. 551, 2025. DOI: 10.3390/diagnostics15050551

N. A. Wani, R. Kumar, and J. Bedi, "Grad-CAM Based Visualization for Interpretable Lung Cancer Categorization Using Deep CNN Models," Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 5, no. 3, pp. 155–165, 2023. DOI: 10.35882/jeeemi.v5i3.690

T. Dawood, C. Chen, B. S. Sidhu, B. Chai, J. S. Whiskin, E. K. Tsang, and A. de Marvao, "Uncertainty Aware Training to Improve Deep Learning Model Calibration for Classification of Cardiac MR Images," Medical Image Analysis, vol. 88, p. 102861, 2023. DOI: 10.1016/j.media.2023.102861

D. E. Boukhari, A. Chemsa, and R. Ajgou, "Facial Beauty Prediction Based on Vision Transformer," International Journal of Electrical and Electronic Engineering and Telecommunications, vol. 13, no. 3, pp. 179–186, 2024. DOI: 10.18178/ijeeec.2024.13.3.1234

T. B. Shahi, C. Sitaula, A. Neupane, and W. Guo, "Fruit Classification Using Attention-Based MobileNetV2 for Industrial Applications," PLOS ONE, vol. 17, no. 2, p. e0264586, 2022. DOI: 10.1371/journal.pone.0264586

M. Tan and Q. V. Le, "EfficientNetV2: Smaller Models and Faster Training," in Proc. 38th Int. Conf. on Machine Learning (ICML), PMLR, vol. 139, pp. 10096–10106, 2021. DOI: 10.48550/arXiv.2104.00298

S. Bobba, "Leveraging Pre-trained Deep Learning Models for Remote Sensing Image Classification: A Case Study with ResNet50 and EfficientNet," American Journal of Science, Engineering and Technology, vol. 9, no. 3, pp. 150–162, 2024. DOI: 10.11648/j.ajset.20240903.11

N. Duklan, S. Kumar, H. Maheshwari, R. Singh, S. D. Sharma, and S. Swami, "CNN Architectures for Image Classification: A Comparative Study Using ResNet50V2, ResNet152V2, InceptionV3, Xception, and MobileNetV2," SSRG International Journal of Electronics and Communication Engineering, vol. 11, no. 9, pp. 11–21, 2024. DOI: 10.14445/23488549/IJECE-V11I9P102

O. Wiles, A. Ravindran, and R. Cinbis, "Improving Evaluation of Facial Attribute Prediction Models," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2021, pp. 3659–3668. DOI: 10.1109/CVPRW53098.2021.00372

K. Alomar, H. I. Aysel, and X. Cai, "Data Augmentation in Classification and Segmentation: A Survey and New Strategies," Journal of Imaging, vol. 9, no. 2, p. 46, 2023. DOI: 10.3390/jimaging9020046

S. Nazim, M. M. Alam, S. S. Rizvi, J. C. Mustapha, S. S. Hussain, and M. M. Suud, "Advancing Malware Imagery Classification with Explainable Deep Learning: A State-of-the-Art Approach Using SHAP, LIME and Grad-CAM," PLOS ONE, vol. 20, no. 5, p. e0318542, 2025. DOI: 10.1371/journal.pone.0318542

D. E. Boukhari, A. Chemsa, R. Ajgou, and F. Dornaika, "Facial Beauty Prediction Using an Ensemble of Deep Convolutional Neural Networks," Engineering Proceedings, vol. 56, no. 1, p. 125, 2023. DOI: 10.3390/ASEC2023-15400

M. Rohani, H. Farsi, and S. Mohamadzadeh, "Deep Multi-Task Convolutional Neural Networks for Efficient Classification of Face Attributes," International Journal of Engineering, vol. 36, no. 11, pp. 2102–2111, 2023. DOI: 10.5829/ije.2023.36.11b.14

N. Ramrakhiani and D. Kalbande, "A Comprehensive Review of AI-Powered Skincare Product Recommendation Systems: From Data Collection to User Experience," E-Learning and Digital Media, Online First, 2024. DOI: 10.1177/20427530241304073

D. E. Boukhari, A. Chemsa, R. Ajgou, and F. Dornaika, "A Comprehensive Review of Facial Beauty Prediction Using Multi-Task Learning and Facial Attributes," ARO — The Scientific Journal of Koya University, vol. 13, no. 1, pp. 1–12, 2025. DOI: 10.14500/aro.11850

A. M. Sheneamer, M. H. Halawi, and M. H. Al-Qahtani, "A Hybrid Human Recognition Framework Using Machine Learning and Deep Neural Networks," PLOS ONE, vol. 19, no. 6, p. e0300614, 2024. DOI: 10.1371/journal.pone.0300614

M. Vinutha, R. B. Dayananda, and A. Kamath, "Personalized Skincare Product Recommendation System Using Content-Based Machine Learning," in Proc. 4th Int. Conf. on Intelligent Technologies (CONIT), IEEE, 2024, pp. 1–6. DOI: 10.1109/CONIT61985.2024.10627271

J. N. Saeed, A. M. Abdulazeez, and D. A. Ibrahim, "FIAC-Net: Facial Image Attractiveness Classification Based on Light Deep Convolutional Neural Network," in Proc. 2nd Int. Conf. on Computer Science, Engineering and Applications (ICCSEA), IEEE, 2022, pp. 1–6. DOI: 10.1109/ICCSEA54677.2022.9936421

B. Şener, K. Acici, and E. Sümer, "Categorization of Alzheimer's Disease Stages Using Deep Learning Approaches with McNemar's Test," PeerJ Computer Science, vol. 10, p. e1877, 2024. DOI: 10.7717/peerj-cs.1877

C. Patrício, J. C. Neves, and L. F. Teixeira, "Explainable Deep Learning Methods in Medical Image Classification: A Survey," ACM Computing Surveys, vol. 56, no. 4, pp. 1–41, 2023. DOI: 10.1145/3625287

T. Bradshaw, Z. Huemann, J. Hu, and A. Rahmim, "A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging," Radiology: Artificial Intelligence, vol. 5, no. 4, p. e220232, 2023. DOI: 10.1148/ryai.220232

E. Kee, J. J. Chong, Z. J. Choong, and M. Lau, "A Comparative Analysis of Cross-Validation Techniques for a Smart and Lean Pick-and-Place Solution with Deep Learning," Electronics, vol. 12, no. 11, p. 2371, 2023. DOI: 10.3390/electronics12112371

Y. Liu, J. Liang, and X. Chen, "Facial Attribute Classification by Deep Mining Inter-Attribute Correlations," IET Computer Vision, vol. 17, no. 4, pp. 389–401, 2023. DOI: 10.1049/cvi2.12171

M. Kaur, D. Singh, R. Singh, and H. J. Kim, "Navigating Landscapes Through AI: A Comparative Study of EfficientNet and MobileNetV2 in Image Classification," IEEE Sensors Journal, vol. 23, no. 8, pp. 7982–7994, 2023. DOI: 10.1109/JSEN.2023.3251661

B. D. Boukhari, F. Dornaika, N. Barrena, A. Chemsa, and R. Ajgou, "Automatic Facial Aesthetic Prediction Based on Deep Learning with Loss Ensembles," Applied Sciences, vol. 13, no. 17, p. 9728, 2023. DOI: 10.3390/app13179728

Z. He, Y. Chen, and C. Rathgeb, "Makeup Transfer: A Review," IET Computer Vision, vol. 17, no. 5, pp. 513–526, 2023. DOI: 10.1049/cvi2.12142

G. Wu, Q. Zhao, J. Liu, Z. Pan, and X. Zhu, "ACGAN: Age-Compensated Makeup Transfer Based on Homologous Continuity Generative Adversarial Network Model," IET Computer Vision, vol. 17, no. 5, pp. 537–548, 2023. DOI: 10.1049/cvi2.12138

F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper, "ElasticFace: Elastic Margin Loss for Deep Face Recognition," in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2022, pp. 1587–1595. DOI: 10.1109/CVPRW56347.2022.00164

A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera, "Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI," Information Fusion, vol. 58, pp. 82–115, 2020. DOI: 10.1016/j.inffus.2019.12.012

D. Singh, V. Kumar, Vaishali, and M. Kaur, "Classification of COVID-19 Patients from Chest CT Images Using Multi-Scale Convolutional Neural Network," Applied Intelligence, vol. 51, no. 5, pp. 3143–3159, 2021. DOI: 10.1007/s10489-020-01968-7

M. H. Yap, V. Goyal, F. Osman, R. Ahmad, E. Usher, E. Doumenis, and J. Cassidy, "Deep Learning in Dermatology: A Systematic Review of Current Approaches, Outcomes, and Limitations," JID Innovations, vol. 2, no. 1, p. 100069, 2022. DOI: 10.1016/j.xjidi.2021.100069

A. Raza, I. Rehman, T. Saba, S. Mehmood, S. A. Bahaj, and H. Ali, "SD-CNN: A Shallow-Deep CNN for Improved Breast Cancer Diagnosis," Computers in Biology and Medicine, vol. 164, p. 107338, 2023. DOI: 10.1016/j.compbiomed.2023.107338

S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, "Image Segmentation Using Deep Learning: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3523–3542, 2022. DOI: 10.1109/TPAMI.2021.3059968

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, 2022. DOI: 10.1109/TNNLS.2021.3084827

P. Chandran, G. Clarke, C. Fearn, G. Goodman, and C. Phelps, "Predictive Modeling of Skin Concern Severity Using Machine Learning," Journal of Cosmetic Dermatology, vol. 22, no. 1, pp. 119–127, 2023. DOI: 10.1111/jocd.15414

C. L. Srinidhi, O. Ciga, and A. L. Martel, "Deep Neural Network Models for Computational Histopathology: A Survey," Medical Image Analysis, vol. 67, p. 101813, 2021. DOI: 10.1016/j.media.2020.101813

C.-Y. Liao, Y.-H. Liu, C.-Y. Chen, and F.-J. Shiou, "Facial Skincare Products' Recommendation with Computer Vision Technologies," Electronics, vol. 11, no. 1, p. 143, 2022. DOI: 10.3390/electronics11010143

S. Kang, G. Kim, and C. D. Yoo, "Fair Facial Attribute Classification via Causal Graph-Based Attribute Translation," Sensors, vol. 22, no. 14, p. 5271, 2022. DOI: 10.3390/s22145271

P. Sharma, S. Nandan, D. Gupta, P. Khanna, M. Rashid, and R. Ravi, "EfficientNet-Based Deep Learning Model for Facial Attribute Analysis," Computational Intelligence and Neuroscience, vol. 2022, p. 3861236, 2022. DOI: 10.1155/2022/3861236

M. N. Alam, T. Garg, M. L. Cummins, B. D. Garg, and G. D. Berber, "Facial Attribute Prediction Using Deep Learning," in Proc. 2022 IEEE Int. Conf. on Image Processing (ICIP), IEEE, 2022, pp. 2991–2995. DOI: 10.1109/ICIP46576.2022.9898058

C. Bekbolatova, M. Metsker, A. M. Kovalchuk, and M. Turgambayeva, "Cosmetology in the Era of Artificial Intelligence," Cosmetics, vol. 11, no. 4, p. 135, 2024. DOI: 10.3390/cosmetics11040135