Comparison of the Application of Linear Regression with Sliding Window Validation and K-Fold Cross-Validation for Forecasting Covid-19 Recovered Cases
DOI:
https://doi.org/10.34288/jri.v6i3.288Keywords:
Covid-19, Forecasting, Linear Regression, Sliding Window ValidationAbstract
The increase in confirmed cases and deaths due to Covid-10 continues to spread and increase day by day throughout the world. This has resulted in a world health crisis that impacts all sectors of life. The government declared a movement to suppress the spread of Covid-19, so it is necessary to understand the pattern of Covid-19 problems. Researchers contribute scientifically to finding patterns of death or recovery due to COVID-19 by applying Machine Learning methods. The Linear Regression and Sliding Window preprocessing methods are appropriate for forecasting time series data. This research obtained RMSE results at 0.320 with linear regression with sliding window validation and RMSE at 0.320 with linear regression with K-Fold cross-validation. This proves that Linear Regression with Sliding Window Validation can improve performance much better than k-fold cross-validation in forecasting COVID-19 recovery cases in China. The sliding window validation method has been proven to increase accuracy for forecasting with time series data compared to other standard preprocessing methods, namely K-Fold cross-validation. In the future, further research is needed to test different types of time series data by comparing the application of sliding window validation and K-Fold cross-validation or developing other validation models.
Downloads
References
Baalamurugan, K. M., & Phutela, A. (2024). Measurement : Sensors Covid-19 infection deficiency based on risk prediction using adaptive social spider featured decisive convolution neural network. Measurement: Sensors, 33(April), 101175. https://doi.org/10.1016/j.measen.2024.101175
BenYahmed, Y., Abu Bakar, A., RazakHamdan, A., Ahmed, A., & Abdullah, S. M. S. (2015). Adaptive sliding window algorithm for weather data segmentation. Journal of Theoretical and Applied Information Technology, 80(2), 322–333.
Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics and Data Analysis, 120, 70–83. https://doi.org/10.1016/j.csda.2017.11.003
Biyeme, F., Mbakop, A. M., Chana, A. M., Voufo, J., & Meva’a, J. R. L. (2023). An analytical model for analyzing the value of information flow in the production chain model using regression algorithms and neural networks. Supply Chain Analytics, 2(February), 100013. https://doi.org/10.1016/j.sca.2023.100013
Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 45–76. https://doi.org/10.28945/4184
Brockmann, D., Hufnagel, L., & Geisel, T. (2006). Data Mining and Knowledge Discovery Handbook. In Springer. https://doi.org/10.1038/nature04292
Castillo, O., & Melin, P. (2020). Forecasting of COVID-19 time series for countries in the world based on a hybrid approach combining the fractal dimension and fuzzy logic. Chaos, Solitons and Fractals, 140, 110242. https://doi.org/10.1016/j.chaos.2020.110242
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Chen, L., An, J., Wang, H., Zhang, M., & Pan, H. (2020). Remaining useful life prediction for lithium-ion battery by combining an improved particle filter with sliding-window gray model. Energy Reports, 6, 2086–2093. https://doi.org/10.1016/j.egyr.2020.07.026
Chen, Z., & Yang, Y. (2004). Assessing forecast accuracy measures. Preprint Series, June, 1–26. http://www.stat.iastate.edu/preprint/articles/2004-10.pdf
Chimmula, V. K. R., & Zhang, L. (2020). Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons and Fractals, 135. https://doi.org/10.1016/j.chaos.2020.109864
Davtyan, A., Rodin, A., Muchnik, I., & Romashkin, A. (2020). Oil production forecast models based on sliding window regression. Journal of Petroleum Science and Engineering, 195(September), 107916. https://doi.org/10.1016/j.petrol.2020.107916
Fanelli, D., & Piazza, F. (2020). Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos, Solitons and Fractals, 134, 109761. https://doi.org/10.1016/j.chaos.2020.109761
Fang, T., & Lahdelma, R. (2016). Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Applied Energy, 179, 544–552. https://doi.org/10.1016/j.apenergy.2016.06.133
Filho, M. (2023). How To Do Time Series Cross-Validation In Python. https://forecastegy.com/posts/time-series-cross-validation-python/
Fumo, N., & Rafe Biswas, M. A. (2015). Regression analysis for prediction of residential energy consumption. Renewable and Sustainable Energy Reviews, 47, 332–343. https://doi.org/10.1016/j.rser.2015.03.035
Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. In Data Mining. https://doi.org/10.1016/b978-0-12-381479-1.00001-0
Hao, X., Guo, T., Huang, G., Shi, X., Zhao, Y., & Yang, Y. (2020). Energy consumption prediction in cement calcination process: A method of deep belief network with sliding window. Energy, 207, 118256. https://doi.org/10.1016/j.energy.2020.118256
Kavadi, D. P., Patan, R., Ramachandran, M., & Gandomi, A. H. (2020). Partial derivative Nonlinear Global Pandemic Machine Learning prediction of COVID 19. Chaos, Solitons and Fractals, 139. https://doi.org/10.1016/j.chaos.2020.110056
Mustafa Qamar-ud-Din. (2019). Cross-Validation strategies for Time Series forecasting [Tutorial]. Packt Editorial Staff. https://hub.packtpub.com/cross-validation-strategies-for-time-series-forecasting-tutorial/
Norwawi, N. M. (2021). Sliding window time series forecasting with multilayer perceptron and multiregression of COVID-19 outbreak in Malaysia. In Data Science for COVID-19 Volume 1: Computational Perspectives (pp. 547–564). Elsevier Inc. https://doi.org/10.1016/B978-0-12-824536-1.00025-3
Opsomer, J., Wang, Y., & Yang, Y. (2001). Nonparametric Regression with Correlated Errors. Statistical Science, 16(2), 134–153. https://doi.org/10.1214/ss/1009213287
Özen, F. (2024). Random forest regression for prediction of Covid-19 daily cases and deaths in Turkey. Heliyon, 10(4), 1–19. https://doi.org/10.1016/j.heliyon.2024.e25746
Papadopoulos, D. N., Dadras, F., Najafi, B., Haghighat, A., & Rinaldi, F. (2023). Energy & Buildings Handling complete short-term data logging failure in smart buildings : Machine learning based forecasting pipelines with sliding-window training scheme. Energy & Buildings, 301(October), 113694. https://doi.org/10.1016/j.enbuild.2023.113694
Pérez-Solano, J. J., & Felici-Castell, S. (2015). Adaptive time window linear regression algorithm for accurate time synchronization in wireless sensor networks. Ad Hoc Networks, 24(PA), 92–108. https://doi.org/10.1016/j.adhoc.2014.08.002
Priya, T., Sarkar, B. K., & Sahana, S. K. (2024). Regression based machine learning models for forecasting preterm birth cases. Procedia Computer Science, 235(2023), 830–839. https://doi.org/10.1016/j.procs.2024.04.079
Rath, S., Tripathy, A., & Tripathy, A. R. (2020). Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes and Metabolic Syndrome: Clinical Research and Reviews, 14(5), 1467–1474. https://doi.org/10.1016/j.dsx.2020.07.045
Saba, A. I., & Elsheikh, A. H. (2020). Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Safety and Environmental Protection, 141, 1–8. https://doi.org/10.1016/j.psep.2020.05.029
Sanghatawatana, P., Thaithatkul, P., Anuchitchanchai, O., Liang, J., & Chalermpong, S. (2023). The effect of COVID-19 lockdown on particulate matters concentration: Case of land use regression difference modeling in Bangkok, Thailand. City and Environment Interactions, 20(April), 100125. https://doi.org/10.1016/j.cacint.2023.100125
Shao, M., Wang, X., Bu, Z., Chen, X., & Wang, Y. (2020). Prediction of energy consumption in hotel buildings via support vector machines. Sustainable Cities and Society, 57(March), 102128. https://doi.org/10.1016/j.scs.2020.102128
Wahyuni, R. E. (2021). Optimasi Prediksi Inflasi Dengan Neural Network Pada Tahap Windowing Adakah Pengaruh Perbedaan Window Size. Technologia: Jurnal Ilmiah, 12(3), 176. https://doi.org/10.31602/tji.v12i3.5181
WHO. (2021). Coronavirus Disease 2019 ( COVID-19 ) Coronavirus Coronavirus Disease Disease Situation World Health World Health Organization Organization 28 April 2021. Covid 19. https://cdn.who.int/media/docs/default-source/searo/indonesia/covid19/external-situation-report-46_10-march-2021-update.pdf?sfvrsn=1859ffc2_5
Wibowo, A. (2017). 10 Fold-Cross Validation. https://mti.binus.ac.id/2017/11/24/10-fold-cross-validation/
Yalçınkaya, A., Balay, İ. G., & Şenoǧlu, B. (2021). A new approach using the genetic algorithm for parameter estimation in multiple linear regression with long-tailed symmetric distributed error terms: An application to the Covid-19 data. Chemometrics and Intelligent Laboratory Systems, 216(November 2020). https://doi.org/10.1016/j.chemolab.2021.104372
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Tyas Setiyorini, Frieyadie
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Jurnal Riset Informatika has legal rules for accessing digital electronic articles uunder a Creative Commons Attribution-NonCommercial 4.0 International License . Articles published in Jurnal Riset Informatika, provide Open Access, for the purpose of scientific development, research, and libraries.