Comparison of the Application of Linear Regression with Sliding Window Validation and K-Fold Cross-Validation for Forecasting Covid-19 Recovered Cases

Authors

  • Tyas Setiyorini Universitas Nusa Mandiri
  • Frieyadie Frieyadie Universitas Nusa Mandiri
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v6i3.288

Keywords:

Covid-19, Forecasting, Linear Regression, Sliding Window Validation

Abstract

The increase in confirmed cases and deaths due to Covid-10 continues to spread and increase day by day throughout the world. This has resulted in a world health crisis that impacts all sectors of life. The government declared a movement to suppress the spread of Covid-19, so it is necessary to understand the pattern of Covid-19 problems. Researchers contribute scientifically to finding patterns of death or recovery due to COVID-19 by applying Machine Learning methods. The Linear Regression and Sliding Window preprocessing methods are appropriate for forecasting time series data. This research obtained RMSE results at 0.320 with linear regression with sliding window validation and RMSE at 0.320 with linear regression with K-Fold cross-validation. This proves that Linear Regression with Sliding Window Validation can improve performance much better than k-fold cross-validation in forecasting COVID-19 recovery cases in China. The sliding window validation method has been proven to increase accuracy for forecasting with time series data compared to other standard preprocessing methods, namely K-Fold cross-validation. In the future, further research is needed to test different types of time series data by comparing the application of sliding window validation and K-Fold cross-validation or developing other validation models.

Downloads

Download data is not yet available.

References

Baalamurugan, K. M., & Phutela, A. (2024). Measurement : Sensors Covid-19 infection deficiency based on risk prediction using adaptive social spider featured decisive convolution neural network. Measurement: Sensors, 33(April), 101175. https://doi.org/10.1016/j.measen.2024.101175

BenYahmed, Y., Abu Bakar, A., RazakHamdan, A., Ahmed, A., & Abdullah, S. M. S. (2015). Adaptive sliding window algorithm for weather data segmentation. Journal of Theoretical and Applied Information Technology, 80(2), 322–333.

Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics and Data Analysis, 120, 70–83. https://doi.org/10.1016/j.csda.2017.11.003

Biyeme, F., Mbakop, A. M., Chana, A. M., Voufo, J., & Meva’a, J. R. L. (2023). An analytical model for analyzing the value of information flow in the production chain model using regression algorithms and neural networks. Supply Chain Analytics, 2(February), 100013. https://doi.org/10.1016/j.sca.2023.100013

Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 45–76. https://doi.org/10.28945/4184

Brockmann, D., Hufnagel, L., & Geisel, T. (2006). Data Mining and Knowledge Discovery Handbook. In Springer. https://doi.org/10.1038/nature04292

Castillo, O., & Melin, P. (2020). Forecasting of COVID-19 time series for countries in the world based on a hybrid approach combining the fractal dimension and fuzzy logic. Chaos, Solitons and Fractals, 140, 110242. https://doi.org/10.1016/j.chaos.2020.110242

Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

Chen, L., An, J., Wang, H., Zhang, M., & Pan, H. (2020). Remaining useful life prediction for lithium-ion battery by combining an improved particle filter with sliding-window gray model. Energy Reports, 6, 2086–2093. https://doi.org/10.1016/j.egyr.2020.07.026

Chen, Z., & Yang, Y. (2004). Assessing forecast accuracy measures. Preprint Series, June, 1–26. http://www.stat.iastate.edu/preprint/articles/2004-10.pdf

Chimmula, V. K. R., & Zhang, L. (2020). Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons and Fractals, 135. https://doi.org/10.1016/j.chaos.2020.109864

Davtyan, A., Rodin, A., Muchnik, I., & Romashkin, A. (2020). Oil production forecast models based on sliding window regression. Journal of Petroleum Science and Engineering, 195(September), 107916. https://doi.org/10.1016/j.petrol.2020.107916

Fanelli, D., & Piazza, F. (2020). Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos, Solitons and Fractals, 134, 109761. https://doi.org/10.1016/j.chaos.2020.109761

Fang, T., & Lahdelma, R. (2016). Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Applied Energy, 179, 544–552. https://doi.org/10.1016/j.apenergy.2016.06.133

Filho, M. (2023). How To Do Time Series Cross-Validation In Python. https://forecastegy.com/posts/time-series-cross-validation-python/

Fumo, N., & Rafe Biswas, M. A. (2015). Regression analysis for prediction of residential energy consumption. Renewable and Sustainable Energy Reviews, 47, 332–343. https://doi.org/10.1016/j.rser.2015.03.035

Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. In Data Mining. https://doi.org/10.1016/b978-0-12-381479-1.00001-0

Hao, X., Guo, T., Huang, G., Shi, X., Zhao, Y., & Yang, Y. (2020). Energy consumption prediction in cement calcination process: A method of deep belief network with sliding window. Energy, 207, 118256. https://doi.org/10.1016/j.energy.2020.118256

Kavadi, D. P., Patan, R., Ramachandran, M., & Gandomi, A. H. (2020). Partial derivative Nonlinear Global Pandemic Machine Learning prediction of COVID 19. Chaos, Solitons and Fractals, 139. https://doi.org/10.1016/j.chaos.2020.110056

Mustafa Qamar-ud-Din. (2019). Cross-Validation strategies for Time Series forecasting [Tutorial]. Packt Editorial Staff. https://hub.packtpub.com/cross-validation-strategies-for-time-series-forecasting-tutorial/

Norwawi, N. M. (2021). Sliding window time series forecasting with multilayer perceptron and multiregression of COVID-19 outbreak in Malaysia. In Data Science for COVID-19 Volume 1: Computational Perspectives (pp. 547–564). Elsevier Inc. https://doi.org/10.1016/B978-0-12-824536-1.00025-3

Opsomer, J., Wang, Y., & Yang, Y. (2001). Nonparametric Regression with Correlated Errors. Statistical Science, 16(2), 134–153. https://doi.org/10.1214/ss/1009213287

Özen, F. (2024). Random forest regression for prediction of Covid-19 daily cases and deaths in Turkey. Heliyon, 10(4), 1–19. https://doi.org/10.1016/j.heliyon.2024.e25746

Papadopoulos, D. N., Dadras, F., Najafi, B., Haghighat, A., & Rinaldi, F. (2023). Energy & Buildings Handling complete short-term data logging failure in smart buildings : Machine learning based forecasting pipelines with sliding-window training scheme. Energy & Buildings, 301(October), 113694. https://doi.org/10.1016/j.enbuild.2023.113694

Pérez-Solano, J. J., & Felici-Castell, S. (2015). Adaptive time window linear regression algorithm for accurate time synchronization in wireless sensor networks. Ad Hoc Networks, 24(PA), 92–108. https://doi.org/10.1016/j.adhoc.2014.08.002

Priya, T., Sarkar, B. K., & Sahana, S. K. (2024). Regression based machine learning models for forecasting preterm birth cases. Procedia Computer Science, 235(2023), 830–839. https://doi.org/10.1016/j.procs.2024.04.079

Rath, S., Tripathy, A., & Tripathy, A. R. (2020). Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes and Metabolic Syndrome: Clinical Research and Reviews, 14(5), 1467–1474. https://doi.org/10.1016/j.dsx.2020.07.045

Saba, A. I., & Elsheikh, A. H. (2020). Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Safety and Environmental Protection, 141, 1–8. https://doi.org/10.1016/j.psep.2020.05.029

Sanghatawatana, P., Thaithatkul, P., Anuchitchanchai, O., Liang, J., & Chalermpong, S. (2023). The effect of COVID-19 lockdown on particulate matters concentration: Case of land use regression difference modeling in Bangkok, Thailand. City and Environment Interactions, 20(April), 100125. https://doi.org/10.1016/j.cacint.2023.100125

Shao, M., Wang, X., Bu, Z., Chen, X., & Wang, Y. (2020). Prediction of energy consumption in hotel buildings via support vector machines. Sustainable Cities and Society, 57(March), 102128. https://doi.org/10.1016/j.scs.2020.102128

Wahyuni, R. E. (2021). Optimasi Prediksi Inflasi Dengan Neural Network Pada Tahap Windowing Adakah Pengaruh Perbedaan Window Size. Technologia: Jurnal Ilmiah, 12(3), 176. https://doi.org/10.31602/tji.v12i3.5181

WHO. (2021). Coronavirus Disease 2019 ( COVID-19 ) Coronavirus Coronavirus Disease Disease Situation World Health World Health Organization Organization 28 April 2021. Covid 19. https://cdn.who.int/media/docs/default-source/searo/indonesia/covid19/external-situation-report-46_10-march-2021-update.pdf?sfvrsn=1859ffc2_5

Wibowo, A. (2017). 10 Fold-Cross Validation. https://mti.binus.ac.id/2017/11/24/10-fold-cross-validation/

Yalçınkaya, A., Balay, İ. G., & Şenoǧlu, B. (2021). A new approach using the genetic algorithm for parameter estimation in multiple linear regression with long-tailed symmetric distributed error terms: An application to the Covid-19 data. Chemometrics and Intelligent Laboratory Systems, 216(November 2020). https://doi.org/10.1016/j.chemolab.2021.104372

Downloads

Published

2024-06-15

How to Cite

Setiyorini, T. ., & Frieyadie, F. (2024). Comparison of the Application of Linear Regression with Sliding Window Validation and K-Fold Cross-Validation for Forecasting Covid-19 Recovered Cases. Jurnal Riset Informatika, 6(3), 159–166. https://doi.org/10.34288/jri.v6i3.288

Issue

Section

Articles

Most read articles by the same author(s)