Multivariate Forecasting of Paddy Production: A Comparative Study of Machine Learning Models
DOI:
https://doi.org/10.52436/1.jutif.2025.6.3.4681Keywords:
Forecasting, Machine Learning, Multivariate Regression, Paddy Production, Random ForestAbstract
Accurate rice production forecasting plays an important role in supporting national food security planning. This study aims to evaluate the performance of four machine learning algorithms, namely Random Forest, XGBoost, Support Vector Regression (SVR), and Linear Regression, in predicting three target variables simultaneously: harvest area, productivity, and production. The dataset used includes annual data per province in Indonesia from 2018 to 2024 obtained from the Central Statistics Agency (BPS). Evaluation was conducted using five metrics: MAE, RMSE, MAPE, R², and training time. The results of the experiment showed that the Random Forest Regressor performed best in the 80:20 scenario, with an MAE of 76,259.52, an RMSE of 154,036.91, a MAPE of 0.61%, and an R² of 0.997. XGBoost showed a competitive performance with an MAE of 79,381.44 and faster training times. In contrast, the SVR showed the worst performance with the MAPE reaching 198.56% and the R² of 0.209. Linear Regression as baseline recorded an MAE of 1,194,355.28 and an R² of 0.503, indicating that the linear model is not effective enough for this data. The 80:20 scenario is considered the best configuration because it is able to balance the accuracy and generalization of the model. These findings show that the use of ensemble algorithms, especially Random Forest and XGBoost, has the potential to be applied practically by agricultural agencies or local governments in designing data-driven policies for more proactive and predictive rice production management. Furthermore, this study contributes to the advancement of applied informatics by demonstrating how machine learning models can be effectively used in multivariate forecasting for complex, real-world problems, thereby supporting the development of intelligent decision-support systems in the agricultural domain.
Downloads
References
S. Herliana, S. Ratnaningtyas, S. Nur Arifin, and N. Lawiyah, ‘Analysis of Indonesia’s Food Security Strategy: Rice Price Volatility’, Glob. Conf. Bus. Soc. Sci. Proceeding, vol. 14, no. 2, pp. 1–1, Dec. 2022, doi: 10.35609/gcbssproceeding.2022.2(35).
A. Suryana, M. D. Hartono, A. T. Suryana, M. R. Suryana, J. P. Sinaga, and A. R. Irawan, ‘Stability of rice availability and prices in Indonesia during the COVID-19 pandemic and Russia-Ukraine war’, BIO Web Conf., vol. 119, p. 02013, 2024, doi: 10.1051/bioconf/202411902013.
N. A. Yusrin, ‘The Analysis Of Rice Massive Importing In Indonesia Based On Macroeconomics, Microeconomics, International Economics And Politic Economics’, Ultima Manag. J. Ilmu Manaj., pp. 308–329, Dec. 2023, doi: 10.31937/manajemen.v15i2.3411.
S. Kholik, I. Nurlinda, Z. Muttaqin, and M. Priyanta, ‘Reformulation of Policies to Prevent Land Conversion of Rice Fields In Achieving Indonesia’s National Food Security’, F1000Research, vol. 13, p. 945, Aug. 2024, doi: 10.12688/f1000research.151364.1.
H. V. Christopher, A. A. Purnama, and S. M. M. Harahap, ‘Application of K-Means Clustering and OR-Tools to Optimize Rice Distribution: A Case Study of Perum Bulog Indonesia’, Appl. Inf. Syst. Manag. AISM, vol. 7, no. 2, Sep. 2024, doi: 10.15408/aism.v7i2.40618.
A. Massagony, T. Tam Ho, and K. Shimada, ‘Climate change impact and adaptation policy effectiveness on rice production in Indonesia’, Int. J. Environ. Stud., vol. 80, no. 5, pp. 1373–1390, Sep. 2023, doi: 10.1080/00207233.2022.2099110.
Erlin, A. Yunianta, L. A. Wulandhari, Y. Desnelita, N. Nasution, and Junadhi, ‘Enhancing Rice Production Prediction in Indonesia Using Advanced Machine Learning Models’, IEEE Access, vol. 12, pp. 151161–151177, 2024, doi: 10.1109/ACCESS.2024.3478738.
A. Kurniawan, T. R. Soeprobowati, and B. Warsito, ‘Innovative Agricultural Solutions: Utilizing Machine Learning to Combat Rice Leaf Disease in Indonesia’, in 2024 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia: IEEE, Sep. 2024, pp. 247–252. doi: 10.1109/iSemantic63362.2024.10762341.
P. Mahesh and R. Soundrapandiyan, ‘Yield prediction for crops by gradient-based algorithms’, PLOS ONE, vol. 19, no. 8, p. e0291928, Aug. 2024, doi: 10.1371/journal.pone.0291928.
M. U. Maheswari and R. Ramani, ‘A Comparative Study of Agricultural Crop Yield Prediction Using Machine Learning Techniques’, in 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India: IEEE, Mar. 2023, pp. 1428–1433. doi: 10.1109/ICACCS57279.2023.10112854.
K. Geetha, B. V. Vidhya, and A. Kiran, ‘An Extensive Study on Precision Farming Based on Crop Yield Using Integrated Approaches to Learning’, in 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India: IEEE, Nov. 2023, pp. 1–8. doi: 10.1109/RMKMATE59243.2023.10369738.
Z. Chen, H. S. Goh, K. L. Sin, K. Lim, N. K. H. Chung, and X. Y. Liew, ‘Automated Agriculture Commodity Price Prediction System with Machine Learning Techniques’, Adv. Sci. Technol. Eng. Syst. J., vol. 6, no. 4, pp. 376–384, Aug. 2021, doi: 10.25046/aj060442.
O. A. Montesinos López, A. Montesinos López, and J. Crossa, ‘Random Forest for Genomic Prediction’, in Multivariate Statistical Machine Learning Methods for Genomic Prediction, Cham: Springer International Publishing, 2022, pp. 633–681. doi: 10.1007/978-3-030-89010-0_15.
S. N. K, V. K. E B, S. J, Y. H, and J. Jk, ‘A Machine Learning approach for crop and fertilizer using Ensemble Model with XGBoost and Random Forest Algorithms’, in International Conference on Recent Trends in Computing & Communication Technologies (ICRCCT’2K24), International Journal of Advanced Trends in Engineering and Management, Nov. 2024. doi: 10.59544/CYJR6469/ICRCCT24P49.
Z. Wang, ‘Fruit and Vegetable Image Recognition Based on Multiple Tree Models: Applications of Random Forest, XGBoost and Decision Tree’, Sci. Technol. Eng. Chem. Environ. Prot., vol. 1, no. 9, Oct. 2024, doi: 10.61173/fvzhe382.
F. García-Vázquez et al., ‘Prediction of Internal Temperature in Greenhouses Using the Supervised Learning Techniques: Linear and Support Vector Regressions’, Appl. Sci., vol. 13, no. 14, p. 8531, Jul. 2023, doi: 10.3390/app13148531.
Akanksha Sharma and Dr. Charu Saraf, ‘Comparing MLR and SVR in Evaluating the Impacts of Climate Change on Sugarcane Production in Saharanpur District’, Int. Res. J. Adv. Eng. Manag. IRJAEM, vol. 3, no. 01, pp. 35–39, Jan. 2025, doi: 10.47392/IRJAEM.2025.0008.
A. Jovanović, A. Krstić, S. Vujnović, and Ž. Durović, ‘On Multivariate Linear Regression Applications’, in 2024 11th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN), Nis, Serbia: IEEE, Jun. 2024, pp. 1–5. doi: 10.1109/IcETRAN62308.2024.10645121.
S. Vinothkumar, S. Varadhaganapathy, R. Shanthakumari, E. Dhivya, K. B. Jayaharitha, and J. Livithasri, ‘Crop Prediction Based on Factors of the Agricultural Environment Using Machine Learning’, in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India: IEEE, Jun. 2024, pp. 1–6. doi: 10.1109/ICCCNT61001.2024.10725351.
M. Alabadla et al., ‘Systematic Review of Using Machine Learning in Imputing Missing Values’, IEEE Access, vol. 10, pp. 44483–44502, 2022, doi: 10.1109/ACCESS.2022.3160841.
A. Tran, C. Zuniga-Navarrete, L. J. Segura, A. Dourado, X. Wang, and C. R. Bego, ‘Categorical Variable Coding for Machine Learning in Engineering Education’, in 2024 IEEE Frontiers in Education Conference (FIE), Washington, DC, USA: IEEE, Oct. 2024, pp. 1–5. doi: 10.1109/FIE61694.2024.10893080.
E. Valdez-Valenzuela, A. Kuri-Morales, and H. Gomez-Adorno, ‘Measuring the Effect of Categorical Encoders in Machine Learning Tasks Using Synthetic Data’, in Advances in Computational Intelligence, vol. 13067, I. Batyrshin, A. Gelbukh, and G. Sidorov, Eds., in Lecture Notes in Computer Science, vol. 13067. , Cham: Springer International Publishing, 2021, pp. 92–107. doi: 10.1007/978-3-030-89817-5_7.
S. Kim, Y. Noh, Y.-J. Kang, S. Park, J.-W. Lee, and S.-W. Chin, ‘Hybrid data-scaling method for fault classification of compressors’, Measurement, vol. 201, p. 111619, Sep. 2022, doi: 10.1016/j.measurement.2022.111619.
D. U. Ozsahin, M. Taiwo Mustapha, A. S. Mubarak, Z. Said Ameen, and B. Uzun, ‘Impact of feature scaling on machine learning models for the diagnosis of diabetes’, in 2022 International Conference on Artificial Intelligence in Everything (AIE), Lefkosa, Cyprus: IEEE, Aug. 2022, pp. 87–94. doi: 10.1109/AIE57029.2022.00024.
M. Sivakumar, S. Parthasarathy, and T. Padmapriya, ‘Trade-off between training and testing ratio in machine learning for medical image processing’, PeerJ Comput. Sci., vol. 10, p. e2245, Sep. 2024, doi: 10.7717/peerj-cs.2245.
Y. Li, C. Jia, H. Chen, H. Su, J. Chen, and D. Wang, ‘Machine Learning Assessment of Damage Grade for Post-Earthquake Buildings: A Three-Stage Approach Directly Handling Categorical Features’, Sustainability, vol. 15, no. 18, p. 13847, Sep. 2023, doi: 10.3390/su151813847.
M. Ibrahim, ‘Evolution of Random Forest from Decision Tree and Bagging: A Bias-Variance Perspective’, Dhaka Univ. J. Appl. Sci. Eng., vol. 7, no. 1, pp. 66–71, Feb. 2023, doi: 10.3329/dujase.v7i1.62888.
J. Bi, E. Li, and Y. Luo, ‘Petroleum Price Prediction Based on the Linear Regression and Random Forest’, Appl. Comput. Eng., vol. 8, no. 1, pp. 292–296, Aug. 2023, doi: 10.54254/2755-2721/8/20230170.
D. F. Santos, ‘Parkinson’s Disease Detection using XGBoost and Machine Learning’, Oct. 25, 2023. doi: 10.1101/2023.10.23.23297369.
R. Sibindi, R. W. Mwangi, and A. G. Waititu, ‘A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices’, Eng. Rep., vol. 5, no. 4, p. e12599, Apr. 2023, doi: 10.1002/eng2.12599.
H. Muradi, A. Saefuddin, I. M. Sumertajaya, A. M. Soleh, and D. D. Domiri, ‘Support Vector Regression (SVR) Method For Paddy Growth Phase Modeling Using Sentinel-1 Image Data’, MEDIA Stat., vol. 16, no. 1, pp. 25–36, Jun. 2023, doi: 10.14710/medstat.16.1.25-36.
T. Uemoto and K. Naito, ‘Support vector regression with penalized likelihood’, Comput. Stat. Data Anal., vol. 174, p. 107522, Oct. 2022, doi: 10.1016/j.csda.2022.107522.
A. Wooditch, N. J. Johnson, R. Solymosi, J. Medina Ariza, and S. Langton, ‘Ordinary Least Squares Regression’, in A Beginner’s Guide to Statistics for Criminology and Criminal Justice Using R, Cham: Springer International Publishing, 2021, pp. 245–268. doi: 10.1007/978-3-030-50625-4_15.
S. Mao, ‘Statistical derivation of linear regression’, in International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021), K. Chen, N. Lin, R. Meštrović, T. A. Oliveira, F. Cen, and H.-M. Yin, Eds., Nanjing, China: SPIE, Apr. 2022, p. 141. doi: 10.1117/12.2628017.
S. M. Robeson and C. J. Willmott, ‘Decomposition of the mean absolute error (MAE) into systematic and unsystematic components’, PLOS ONE, vol. 18, no. 2, p. e0279774, Feb. 2023, doi: 10.1371/journal.pone.0279774.
M. Shanmugavalli and K. M. J. Ignatia, ‘Comparative Study among MAPE, RMSE and R Square over the Treatment Techniques Undergone for PCOS Influenced Women’, Recent Pat. Eng., vol. 19, no. 1, p. e041223224190, Jan. 2025, doi: 10.2174/0118722121269786231120122435.
D. Chicco, M. J. Warrens, and G. Jurman, ‘The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation’, PeerJ Comput. Sci., vol. 7, p. e623, Jul. 2021, doi: 10.7717/peerj-cs.623.
P. G. G. Lagrazon and J. B. Tan, ‘Predicting Crop Yield in Quezon Province, Philippines Using Gaussian Process Regression: A Data-Driven Approach for Agriculture Sustainability’, in 2023 International Conference on Modeling & E-Information Research, Artificial Learning and Digital Applications (ICMERALDA), Karawang, Indonesia: IEEE, Nov. 2023, pp. 7–12. doi: 10.1109/ICMERALDA60125.2023.10458211.
Y. Duan, N. Wang, and J. Wu, ‘Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication’, IEEE Trans. Netw. Sci. Eng., vol. 8, no. 2, pp. 1802–1814, Apr. 2021, doi: 10.1109/TNSE.2021.3073897.
H. S. Sengar and S. Rai, ‘A Comparative Analysis of Different Machine Learning Approaches for Crop Yield Prediction’, in 2024 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India: IEEE, Nov. 2024, pp. 1–5. doi: 10.1109/ICIICS63763.2024.10859455.
N. Sharma and M. Dutta, ‘Yield Prediction and Recommendation of Crops in the Northeastern Region Using Machine Learning Regression Models’, Üzüncü Il Üniversitesi Tarım Bilim. Derg., vol. 33, no. 4, pp. 700–708, Dec. 2023, doi: 10.29133/yyutbd.1321518.
A. S, M. K. Debnath, and K. R, ‘Statistical and machine learning models for location-specific crop yield prediction using weather indices’, Int. J. Biometeorol., vol. 68, no. 12, pp. 2453–2475, Dec. 2024, doi: 10.1007/s00484-024-02763-w.
S. Fatima, A. Hussain, S. B. Amir, S. H. Ahmed, and S. M. H. Aslam, ‘XGBoost and Random Forest Algorithms: An in Depth Analysis’, Pak. J. Sci. Res., vol. 3, no. 1, pp. 26–31, Oct. 2023, doi: 10.57041/pjosr.v3i1.946.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Feri Yasin, Muhammad Raafi'u Firmansyah, Dasril Aldo, Muhammad Afrizal Amrustian

This work is licensed under a Creative Commons Attribution 4.0 International License.