Prediction Of Flight Delays Using Feature Engineering, Catboost, And Bayesian Optimization To Improve Model Performance

Authors

  • Ilham Maulana Universitas Nusa Mandiri
  • Siti Ernawati Universitas Nusa Mandiri
  • Risa Wati Universitas Bina Sarana Informatika
(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v7i2.346

Keywords:

Flight delays, CatBoost, Feature Engineering, Bayesian Optimization, Random Over Sampling

Abstract

Flight delays have become a major issue in the aviation industry, impacting operational efficiency and customer satisfaction. This study proposes a CatBoostClassifier-based approach combined with Feature Engineering, Bayesian Optimization, and Random Over Sampling techniques to improve the accuracy of flight delay predictions. Based on model evaluation results, the use of Feature Engineering and Bayesian Optimization enhances performance compared to the baseline CatBoost model. The CatBoost+FE+Bayes combination achieves an accuracy of 83.32%, higher than the unmodified CatBoost model, which only reaches 82.95%. However, applying the Random Over Sampling technique in the CatBoost+FE+Bayes+ROS combination decreases model performance, reducing accuracy to 81.44%. Regarding other metrics, the CatBoost+FE+Bayes model demonstrates the highest F1-score of 0.62, indicating a balance between precision and recall. Additionally, the Area Under Curve (AUC) analysis reveals that CatBoost+FE+Bayes has the highest AUC value of 0.7793, followed by CatBoost+FE at 0.7768, and the unmodified CatBoost model at 0.7643. Meanwhile, the application of ROS leads to a decrease in AUC value to 0.6787. These findings suggest that utilizing Feature Engineering and Bayesian Optimization significantly enhances flight delay predictions. However, resampling techniques such as ROS do not always positively impact the tested model and can even degrade classification performance. The objective of this research is to develop a more accurate flight delay prediction model through the application of appropriate optimization techniques. The resulting model is expected to improve prediction quality and benefit the aviation industry by optimizing operational efficiency and minimizing the negative impact of delays on passengers.

Downloads

Download data is not yet available.

References

Ahmmad, J., Labassi, F., Alsuraiheed, T., Mahmood, T., & Khan, M. A. (2024). Classification of Feature Engineering Techniques for Machine Learning under the Environment of Lattice Ordered T-Bipolar Soft Rings. IEEE Access, 12. https://doi.org/10.1109/ACCESS.2024.3406388

Alfarhood, M., Alotaibi, R., Abdulrahim, B., Einieh, A., Almousa, M., & Alkhanifer, A. (2024). Predicting Flight Delays with Machine Learning: A Case Study from Saudi Arabian Airlines. International Journal of Aerospace Engineering, 2024, 1–11. https://doi.org/10.1155/2024/3385463

Ardhana, V. Y. P., Syam, M. Y., Ramadani, E. F., Sampetoding, E. A. M., Syahril, M., Manapa, E. S., & Mardzuki, R. (2022). Prediksi Flight Delay Berbasis Algoritma Neural Network. Journal of Informatics, Electrical and Electronics Engineering, 2(1), 26–30. https://doi.org/10.47065/jieee.v2i1.429

Cho, H., Kim, Y., Lee, E., Choi, D., Lee, Y., & Rhee, W. (2020). Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks. IEEE Access-SPECIAL SECTION ON SCALABLE DEEP LEARNING FOR BIG DATA, 8, 52588–52608. https://doi.org/10.1109/ACCESS.2020.2981072

Darmawan, A. … Nugraha, R. A. (2023). Implementasi Catboost Dengan Menggunakan Hyper-Parameter Tuning Bayesian Search Untuk Memprediksi Penyakit Diabetes. Jurnal Komputasi, 11(2), 148–156. https://doi.org/10.23960/komputasi.v11i2.13746

Ernawati, S., & Wati, R. (2024). Evaluasi Performa Kernel SVM dalam Analisis Sentimen Review Aplikasi ChatGPT Menggunakan Hyperparameter dan VADER Lexicon. Jurnal Buana Informatika, 15(01), 40–49. https://doi.org/10.24002/jbi.v15i1.7925

Ernawati, S., Wati, R., & Nuris, N. (2022). Support Vector Classification with Hyperparameters for Analysis of Public Sentiment on Data Security in Indonesia. Jurnal Riset Informatika, 5(1), 85–92. https://doi.org/DOI: https://doi.org/10.34288/jri.v5i1.189

Fitriani, R. D., Yasin, H., & Tarno, T. (2021). Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada Naive Bayes (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal). Jurnal Gaussian, 10(1), 11–20. https://doi.org/10.14710/j.gauss.v10i1.30243

Hatıpoğlu, I., & Tosun, Ö. (2024). Predictive Modeling of Flight Delays at an Airport Using Machine Learning Methods. Applied Sciences (Switzerland), 14(13), 1–19. https://doi.org/10.3390/app14135472

Herdian, C., Kamila, A., & Agung Musa Budidarma, I. G. (2024). Studi Kasus Feature Engineering Untuk Data Teks: Perbandingan Label Encoding dan One-Hot Encoding Pada Metode Linear Regresi. Technologia : Jurnal Ilmiah, 15(1), 93. https://doi.org/10.31602/tji.v15i1.13457

Hu, M., Tan, Q., Knibbe, R., Wang, S., Li, X., Wu, T., … Zhang, M. X. (2021). Prediction of Mechanical Properties of Wrought Aluminium Alloys Using Feature Engineering Assisted Machine Learning Approach. Metallurgical and Materials Transactions A: Physical Metallurgy and Materials Science, 52(7), 2873–2884. https://doi.org/10.1007/s11661-021-06279-5

Jha, R. K., Jha, S. B., Pandey, V., & Babiceanu, R. F. (2019). Flight Delay Prediction using Hybrid Machine Learning Approach: A Case Study of Major Airlines in the United States. Journal of Chemistry Social, 44(5), 871–874. https://doi.org/https://doi.org/10.48550/

Khan, R., Akbar, S., & Zahed, T. A. (2022). Flight Delay Prediction Based on Gradient Boosting Ensemble Techniques. ICOSST 2022 - 16th International Conference on Open Source Systems and Technologies. https://doi.org/10.1109/ICOSST57195.2022.10016828

Ningthoukhongjam, J., Alam, M. S., Kumar, P., & G, M. (2024). Feature Engineering and Hybrid Machine Learning Approach for Flight Delay Prediction. 2024 International Conference on Data Science and Network Security (ICDSNS). https://doi.org/10.1109/ICDSNS62112.2024.10690998

Qalbi, N., & Jayadi, A. (2020). Aspek Hukum Ganti Kerugian Keterlambatan Penerbangan (Flight Delay) Maskapai Penerbangan Komersial Indonesia. Jurnal Media Iuris, 2(3), 302–315. https://doi.org/10.24252/aldev.v2i3.14642

Rajendran, R., & Karthi, A. (2022). Heart Disease Prediction Using Entropy Based Feature Engineering And Ensembling Of Machine Learning Classifiers. Elsevier-Expert Systems With Applications, 207(C). https://doi.org/doi.org/10.1016/j.eswa.2022.117882

Sadaf, K. (2023). Phishing Website Detection using XGBoost and Catboost Classifiers. In IEEE (Ed.), 2023 International Conference on Smart Computing and Application (ICSCA). https://doi.org/10.1109/ICSCA57840.2023.10087829

Verdonck, T., Baesens, B., Óskarsdóttir, M., & vanden Broucke, S. (2024). Special Issue On Feature Engineering Editorial. Machine Learning, 113(7), 3917–3928. https://doi.org/10.1007/s10994-021-06042-2

Victoria, A. H., & Maragatham, G. (2021). Automatic Tuning Of Hyperparameters Using Bayesian Optimization. Springer-Evolving Systems, 12(1), 217–223. https://doi.org/10.1007/s12530-020-09345-2

Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120

Yun, K. K., Yoon, S. W., & Won, D. (2021). Prediction Of Stock Price Direction Using A Hybrid Ga-Xgboost Algorithm With A Three-Stage Feature Engineering Process. Expert Systems with Applications, 186(July), 115716. https://doi.org/10.1016/j.eswa.2021.115716

Downloads

Published

2025-03-15

How to Cite

Maulana, I., Ernawati, S., & Wati, R. (2025). Prediction Of Flight Delays Using Feature Engineering, Catboost, And Bayesian Optimization To Improve Model Performance. Jurnal Riset Informatika, 7(2), 8–15. https://doi.org/10.34288/jri.v7i2.346

Issue

Section

Articles

Most read articles by the same author(s)