Comparative Evaluation of Linear Regression and Ensemble Learning Models for Daily Calorie Prediction Using a Public Lifestyle Dataset with Structured Preprocessing and Recursive Feature Elimination
DOI:
https://doi.org/10.52436/1.jutif.2026.7.3.5621Keywords:
Calorie Estimation, Feature Selection, Lifestyle Data, Regression Analysis, Supervised LearningAbstract
Accurate daily calorie estimates are essential for personalized nutrition and prevention of diet-related conditions, yet lifestyle variability can reduce the effectiveness of one-size-fits-all recommendations. This study aims to develop an accurate lifestyle-based calorie estimation model by comparing an interpretable linear approach with ensemble machine learning methods. A publicly available lifestyle dataset from Kaggle was used, containing demographic variables, anthropometric measurements, food intake, dietary patterns, and physical activity attributes. A preprocessing pipeline was applied, including outlier handling using interquartile range capping, categorical encoding, normalization, and feature selection via Recursive Feature Elimination to identify the most relevant predictors. Four models (Linear Regression, Random Forest, XGBoost, and LightGBM) were trained and evaluated, followed by hyperparameter tuning of ensemble models using GridSearchCV. Performance was assessed using R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) and training time. Linear Regression achieved the best overall performance (R² = 0.9650, MAE = 80.95, RMSE = 101.71, training time = 8.95 seconds). Among ensembles, the tuned XGBoost performed best (R² = 0.9646, MAE = 81.34, RMSE = 102.35, training time = 10.55 seconds). Compared with tuned XGBoost, Linear Regression was superior with MAE by 0.39 and RMSE by 0.64, while R² increased by 0.0004 and required less computational time, indicating that added complexity did not yield meaningful gains on this structured dataset. These findings suggest that, for structured lifestyle data, interpretable linear models can match or outperform complex ensembles while remaining computationally efficient for real-time or edge-deployed health applications.
Downloads
References
N. Afni and Z. Al Faiqoh, “Perbedaan Asupan Energi Makronutrien, Aktivitas Fisik Dan Status Gizi Pada Siswa Di Sma Wahid Hasyim Model Lamongan Yang Bermukim Di Pondok Pesantren Dan Yang Bermukim Di Rumah,” Heal. Tadulako J. (Jurnal Kesehat. Tadulako), vol. 10, no. 2, pp. 306–315, 2024.
M. S. N. Lugia Wanda, “3889-11021-1-Pb,” Hub. Akt. Fis. Energi,Dansarapanpagi Dengan Kejadianoverweightpada Siswa Sma, vol. 17, no. 2, pp. 1–9, 2021.
M. G. Pantaleon, Y. Petrika, A. U. Zogara, Desi, dan M. Niron, “Hubungan asupan energi dan zat gizi serta pengetahuan dengan status gizi pada remaja di Kota Kupang,” SAGO: Gizi dan Kesehatan, vol. 6, no. 2, pp. 301–308, 2025, doi: 10.30867/gikes.v6i2.2388.
D. M. Sari, “Asupan Energi, Kebiasaan Olahraga dan Status Gizi pada Remaja di Inderalaya,” Jurnal Gizi Dietetik, vol. 4, no. 2, pp. 151–157, 2025, doi: 10.25182/jigd.2025.4.2.151-157.
M. Irwanda, D. Suryani, A. Krisnasary, dan Yandrizal, “Gambaran Asupan Energi, Zat Gizi Makro Dan Status Gizi Remaja Di SMP N 14 Kota Bengkulu Tahun 2022,” AKSARA: Jurnal Ilmu Pendidikan Nonformal, vol. 9, no. 1, pp. 199–208, 2023, doi: 10.37905/aksara.9.1.199-208.2023.
D. Mukhtar, K. A. D. A. Ridwan, and H. L. Fitriani, “Impact of Calorie Intake on Cardiovascular Disease Risk Factors for Young Adults Working from Home During the COVID-19 Pandemic,” vol. 7, no. 1, pp. 47–55, 2025, doi: 10.35790/msj.v7i1.52343.
Y. W. A. Rustam and Hendra Gunawan, “Perancangan Aplikasi Perhitungan Kebutuhan Kalori Tubuh Harian Berdasarkan Asupan Konsumsi Makanan Menggunakan Logika Fuzzy,” Inf. (Jurnal Inform. dan Sist. Informasi), vol. 14, no. 2, pp. 94–109, 2022, doi: 10.37424/informasi.v14i2.174.
R. Riswanto, A. Ahmad, H. Hazriani, and D. Tribuana, “Deteksi Kalori Makanan Tradisional Indonesia Menggunakan Metode Single Shot Multibox Detector (SSD),” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 819–829, 2024, doi: 10.57152/malcom.v4i3.1332.
S. A. Oktavianti, Y. Divayana, and I. G. A. P. Raka Agung, “Aplikasi NutriNeeds dalam penentuan kebutuhan kalori harian bagi penderita diabetes melitus,” Jurnal SPEKTRUM, vol. 8, no. 2, pp. 48–54, Jul. 2021, doi: 10.24843/SPEKTRUM.2021.v08.i02.p6.
A. Z. Ulhaq, A. Z. Adilukito, S. M. P. G. Neru, and M. D. Agisfio, “Aplikasi Pencatatan Kalori Harian Berbasis Android Dengan Arsitektur MVVM,” Computer Science (CO-SCIENCE), vol. 5, no. 1, pp. 26–34, 2025, doi: 10.31294/coscience.v5i1.3443.
J. Zheng, J. Wang, J. Shen, and R. An, “Artificial Intelligence Applications to Measure Food and Nutrient Intakes: Scoping Review,” J. Med. Internet Res., vol. 26, p. e54557, 2024, doi: 10.2196/54557.
S. K. Aydın, R. H. Ali, S. Faiz, and T. A. Khan, “An Integrated AI Framework for Personalized Nutrition Using Machine Learning and Natural Language Processing for Dietary Recommendations,” Applied Sciences, vol. 15, no. 17, Art. no. 9283, 2025, doi: 10.3390/app15179283.
C. Y. S. Ang, M. B. M. Nor, N. S. Nordin, T. Z. Kyi, A. Razali, and Y. S. Chiew, “Methods for estimating resting energy expenditure in intensive care patients: A comparative study of predictive equations with machine learning and deep learning approaches,” Computer Methods and Programs in Biomedicine, vol. 262, p. 108657, Apr. 2025, doi: 10.1016/j.cmpb.2025.108657.
R. Ruede, V. Heusser, L. Frank, A. Roitberg, M. Haurilet, and R. Stiefelhagen, “Multi-task learning for calorie prediction on a novel large-scale recipe dataset enriched with nutritional information,” Proc. - Int. Conf. Pattern Recognit., pp. 4001–4008, 2021, doi: 10.1109/ICPR48806.2021.9412839.
S. Mujiyono, U. P. Sanjaya, I. S. Wibisono, and H. Setyowati, “Prediksi Fluktuasi Berat Badan Berdasarkan Pola Hidup Menggunakan Model XGBoost dan Deep Learning,” J. Algoritm., vol. 22, no. 1, pp. 221–233, 2025, doi: 10.33364/algoritma/v.22-1.2253.
S. Wibawa, A. Suherman, K. Sultoni, Jajat, Y. Ruhayati, and W. D. Nuryanti, “Estimasi Kalori Expenditure Berdasarkan Accelerometer ActiGraph dan Ergocycle,” Jambura Journal of Sports Coaching, vol. 7, no. 1, pp. 118–124, Jan. 2025.
B. Budiman, N. Alamsyah, T. Parama Yoga, R. Y. Rakhman Alamsyah, and E. Setiana, “XGBoost optimization using hybrid Bayesian optimization and nested cross validation for calorie prediction,” TELKOMNIKA (Telecommunication Comput. Electron. Control)., vol. 23, no. 3, p. 694, 2025, doi: 10.12928/TELKOMNIKA.v23i3.26554.
C. E. Sukmawati, A. Fitri, N. Masruriyah, and A. R. Juwita, “Efektivitas algoritma AdaBoost dan XGBoost pada dataset obesitas populasi dewasa,” Jambura J. Informatics, vol. 6, no. 2, pp. 101–111, 2024. doi: 10.37905/jji. v6i2.25194.
T. A. Adjuik, N. A. A. Boi-Dsane, and B. A. Kehinde, “Enhancing dietary analysis: Using machine learning for food caloric and health risk assessment,” J. Food Sci., vol. 89, no. 11, pp. 8006–8021, 2024, doi: 10.1111/1750-3841.17421.
G. Tobin, A. Schuhmacher, T. Górecki, and Ł. Smaga, “The development and evaluation of multiple regression equations based on four common nutritional analysis packages to predict the metabolisable energy density of diets fed to grower / finisher and adult pigs and their use for rat and mouse diets,” Br. J. Nutr., vol. 133, no. 4, pp. 433–455, 2025, doi: 10.1017/S0007114525000042.
P. PAULRAJ, P. M. P.MANOJ, and S. S. S.SELVABHARATHI, “Calorie Intake Prediction Using Machine Learning for Personalized Food Recommendations,” Int. J. Creat. Res. Thoughts, vol. 13, no. 7, pp. 687–693, 2025.
S. S. Ratnakar and S. Vidya, “Calorie Burn Predection using Machine Learning,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), vol. 9, no. 6, pp. 781–787, Jun. 2022, doi: 10.17148/IARJSET.2022.96125
S. Devi K. A., G. S, M. Basavaraj, and M. V. S., “Hybrid Model Analysis for Calorie Prediction Using Ensemble Learning Techniques: XGBoost and Random Forest,” International Journal of Environmental Sciences, vol. 11, no. 6, pp. 3024–3031, 2025, doi: 10.64252/1zx66k89.
N. K. Hamzidah, A. Ulandari, M. M. Parenreng, and N. Ichzan As, “Evaluasi Kinerja Aplikasi Mobile Penghitung Kalori Makanan Berbasis Algoritma CNN-YOLO (Performance Evaluation of Food Calorie Counter Mobile Application Based on CNN-YOLO Algorithms),” Jambura Journal of Electrical and Electronics Engineering, vol. 7, no. 2, pp. 253–263, 2025, doi: 10.37905/jjeee.v7i2.30595.
P. Yarde, D. Bordoloi, R. M. Chavan, V. Vekariya, H. Patil, and L. Natrayan, “A Deep Learning Neural Network-based System for Food Recognition and Calorie Estimation,” 2023 3rd Int. Conf. Innov. Mech. Ind. Appl., no. Icimia, pp. 1551–1558, 2023, doi: 10.1109/ICIMIA60377.2023.10426548.
M. Ogishi, H. Tanabe, and K. Yanai, “Diffusion-Guided 3D-Aware Calorie Estimation from a Single Food Image,” in Proc. 1st International Workshop on Multi-modal Food Computing (MMFood ’25) (co-located with ACM Multimedia 2025), 2025, doi: 10.1145/3746264.3760487.
E. S. Sintiya, S. R. Amanda, C. Bella Vista, and A. Nugroho Pramudhita, “Implementasi Machine Learning dalam Sistem Prediksi dan Rekomendasi Program Diet Terintegrasi LLM,” Jurnal Nasional Teknologi dan Sistem Informasi (TEKNOSI), vol. 11, no. 2, pp. 144–151, Sep. 2025, doi: 10.25077/TEKNOSI.v11i2.2025.144-151.
U. Nadifa, R. Deddy, R. Dako, A. I. Tolago, and R. Hidayat, “Efektivitas Optimasi Hyperparameter Dalam Prediksi Pembakaran Kalori : Data Aktivitas Fisik,” vol. 7, pp. 1–8, 2025, doi: 10.32528/elkom.v7i2.22636191.
N. Fosua, C. Courtney, O. Toole, and A. Jalali, “International Journal of Medical Informatics Comparing logistic regression and machine learning for obesity risk prediction : A systematic review and meta-analysis,” Int. J. Med. Inform., vol. 199, p. 105887, Jul. 2025, doi: 10.1016/j.ijmedinf.2025.105887.
Y. Lu, C. Chen, J. Qiu, Q. Ji, L. Zhou, and H. Xiong, “Systematic review and comparison of machine learning and conventional statistical models for predicting cardiovascular events in dialysis patients,” Ren. Fail., vol. 47, no. 1, p. 2587490, Dec. 2025, doi: 10.1080/0886022X.2025.2587490.
U. Khairani, V. Mutiawani, and H. Ahmadian, “Pengaruh tahapan preprocessing terhadap model IndoBERT dan IndoBERTweet untuk mendeteksi emosi pada komentar akun berita Instagram,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 11, no. 4, pp. 887–894, 2024, doi: 10.25126/jtiik.1148315.
A. Fitrianto, A. Kholifatunnisa, and A. Kurnia, “Comparing Outlier Detection Methods : An Application on Indonesian Air Quality Data,” vol. 9, no. 2, pp. 341–351, 2024, doi: 10.18860/ca.v9i2.29434.
A. M. Priyatno, T. Widiyaningtyas, I. Engineering, and U. N. Malang, “A Systematic Literature Review: Recursive Feature Elimination Algorithms,” vol. 9, no. 2, pp. 196–207, 2024, doi: 10.33480/jitk.v9i2.5015.
O. Bulut, B. Tan, and E. Mazzullo, “Benchmarking Variants of Recursive Feature Elimination : Insights from Predictive Tasks in Education and Healthcare,” pp. 1–21, 2025, doi: 10.3390/info16060476.
W. Nugraha and A. Sasongko, “Hyperparameter Tuning on Classification Algorithm with Grid Search,” SISTEMASI: Jurnal Sistem Informasi, vol. 11, no. 2, pp. 391–401, May 2022, doi: 10.32520/stmsi.v11i2.1750.
D. Chicco, M. J. Warrens, and G. Jurman, “The coef fi cient of determination R-squared is more informative than SMAPE , MAE , MAPE , MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, p. e623, 2021, doi: 10.7717/peerj-cs.623.
F. Yasin, M. Raafi, D. Aldo, and M. A. Amrustian, “Multivariate Forecasting of Paddy Production : A Comparative Study of Machine Learning Models,” Jurnal Teknik Informatika (JUTIF), vol. 6, no. 3, pp. 1431–1442, Jun. 2025, doi: 10.52436/1.jutif.2025.6.3.4681.
S. M. Robeson and C. J. Willmott, “Decomposition of the mean absolute error (MAE) into systematic and unsystematic components,” PLoS ONE, vol. 18, no. 2, p. e0279774, 2023, doi: 10.1371/journal.pone.0279774.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Yunandra Wahyu Utama, Majid Rahardi

This work is licensed under a Creative Commons Attribution 4.0 International License.





