Explainable AI-Driven TabNet Model Enhanced with Bayesian Optimization for Lung Cancer Prediction and Interpretation

Ilham Maulana

doi:10.34288/jri.v7i1.354

Authors

Ilham Maulana Universitas Nusa Mandiri

(*) Corresponding Author

DOI:

https://doi.org/10.34288/jri.v7i1.354

Keywords:

TabNet, Bayesian Optimization, Explainable AI, LIME, Kanker Paru-Paru, Prediksi Risiko

Abstract

This study aims to develop an accurate and explainable lung cancer risk prediction model using a TabNet approach optimized with Bayesian Optimization and applying Explainable AI (XAI) methods through LIME (Local Interpretable Model-Agnostic Explanations). TabNet was selected for its efficiency in processing tabular data and its ability to produce high-accuracy predictions. In the initial stage, the TabNet model was tested using a dataset that was preprocessed through standardization and split into training and testing sets. The performance evaluation of the model without optimization showed an accuracy of 95.83%, precision of 95.87%, recall of 95.76%, and F1-Score of 95.81%. Subsequently, Bayesian Optimization was applied using the Optuna library to find the best hyperparameter combination for the TabNet model. The optimization results demonstrated a significant improvement, achieving an accuracy of 98.33%, precision of 98.48%, recall of 98.21%, and F1-Score of 98.32%. After optimizing the TabNet model, LIME was implemented to provide interpretability for the generated predictions. LIME was used to identify the most influential features contributing to the predictions, enhancing the model's transparency in the lung cancer risk prediction process. Through the combination of TabNet, Bayesian Optimization, and Explainable AI, this study successfully developed a lung cancer prediction model that is not only accurate but also highly interpretable. This model can assist medical professionals in identifying key risk factors and providing transparent explanations for each prediction made.

Downloads

Download data is not yet available.

References

Ahmed, Z. U., Sun, K., Shelly, M., & Mu, L. (2021). Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA. Scientific Reports, 11(1), 1–15. https://doi.org/10.1038/s41598-021-03198-8

Aqila, A., & Faisal, M. (2023). Lung Cancer EDA Classification Using the Decision Trees Method in Python. Informatics and Software Engineering, 1(1), 8–13. https://doi.org/10.58777/ise.v1i1.56

Arık, S., & Pfister, T. (2021). TabNet: Attentive Interpretable Tabular Learning. 35th AAAI Conference on Artificial Intelligence, AAAI 2021, 8A, 6679–6687. https://doi.org/10.1609/aaai.v35i8.16826

Chandran, U., Reps, J., Yang, R., Vachani, A., Maldonado, F., & Kalsekar, I. (2023). Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care. Cancer Epidemiology Biomarkers and Prevention, 32(3), 337–343. https://doi.org/10.1158/1055-9965.EPI-22-0873

Gandhi, Z., Gurram, P., Amgai, B., Lekkala, S. P., Lokhandwala, A., Manne, S., Mohammed, A., Koshiya, H., Dewaswala, N., Desai, R., Bhopalwala, H., Ganti, S., & Surani, S. (2023). Artificial Intelligence and Lung Cancer: Impact on Improving Patient Outcomes. Cancers, 15(21), 1–16. https://doi.org/10.3390/cancers15215236

Indra, M., Maulana, I., & Ernawati, S. (2024). Machine Learning for Stroke Prediction : Evaluating the Effectiveness of Data Balancing Approaches. 6(4).

Ji, A. (2024). Enhancing Lung Cancer Screening with Bidirectional LSTM and GRU Models. 0, 139–142. https://doi.org/10.54254/2755-2721/104/20241187

Lee, H. A., Chao, L. R., & Hsu, C. Y. (2021). A 10-year probability deep neural network prediction model for lung cancer. Cancers, 13(4), 1–15. https://doi.org/10.3390/cancers13040928

Maulana, I., Ernawati, S., & Indra, M. (2024). IMPROVING IMAGE CLASSIFICATION ACCURACY WITH OVERSAMPLING AND DATA AUGMENTATION USING DEEP LEARNING : A CASE STUDY ON. 6(4).

Moozhippurath, B., & Natarajan, J. (2025). Lung cancer prediction with advanced graph neural networks. Indonesian Journal of Electrical Engineering and Computer Science, 37(2), 1077–1084. https://doi.org/10.11591/ijeecs.v37.i2.pp1077-1084

Nemlander, E., Rosenblad, A., Abedi, E., Ekman, S., Hasselström, J., Eriksson, L. E., & Carlsson, A. C. (2022). Lung cancer prediction using machine learning on data from a symptom e-questionnaire for never smokers, formers smokers and current smokers. PLoS ONE, 17(10 October), 1–11. https://doi.org/10.1371/journal.pone.0276703

Nguyen, H. V., & Byeon, H. (2023). Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea. Mathematics, 11(14). https://doi.org/10.3390/math11143145

Nguyen, H. V., & Byeon, H. (2024). A hybrid self-supervised model predicting life satisfaction in South Korea. Frontiers in Public Health, 12(October), 1445864. https://doi.org/10.3389/fpubh.2024.1445864

Raptis, S., Ilioudis, C., & Theodorou, K. (2024). From pixels to prognosis: unveiling radiomics models with SHAP and LIME for enhanced interpretability. Biomedical Physics and Engineering Express, 10(3). https://doi.org/10.1088/2057-1976/ad34db

Smith, R. J., Vijayaharan, T., Linehan, V., Sun, Z., Ein Yong, J. H., Harris, S., Mariathas, H. H., & Bhatia, R. (2022). Efficacy of Risk Prediction Models and Thresholds to Select Patients for Lung Cancer Screening. Canadian Association of Radiologists Journal, 73(4), 672–679. https://doi.org/10.1177/08465371221089899

Sun, Y., Cheng, G., Wei, D., Luo, J., & Liu, J. (2024). Integrating omics data and machine learning techniques for precision detection of oral squamous cell carcinoma: evaluating single biomarkers. Frontiers in Immunology, 15(December), 1–14. https://doi.org/10.3389/fimmu.2024.1493377

Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. https://doi.org/10.3322/caac.21660

Tao, G., Zhu, L., Chen, Q., Yin, L., Li, Y., Yang, J., Ni, B., Zhang, Z., Koo, C. W., Patil, P. D., Chen, Y., Yu, H., Xu, Y., & Ye, X. (2022). Prediction of future imagery of lung nodule as growth modeling with follow-up computed tomography scans using deep learning: A retrospective cohort study. Translational Lung Cancer Research, 11(2), 250–262. https://doi.org/10.21037/tlcr-22-59

Zamzam, Y. F., Saragih, T. H., Herteno, R., Muliadi, Nugrahadi, D. T., & Huynh, P. H. (2024). Comparison of CatBoost and Random Forest Methods for Lung Cancer Classification using Hyperparameter Tuning Bayesian Optimization-based. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(2), 125–136. https://doi.org/10.35882/jeeemi.v6i2.382

Zhang, R., Shen, S., Wei, Y., Zhu, Y., Li, Y., Chen, J., Guan, J., Pan, Z., Wang, Y., Zhu, M., Xie, J., Xiao, X., Zhu, D., Li, Y., Albanes, D., Landi, M. T., Caporaso, N. E., Lam, S., Tardon, A., … Christiani, D. C. (2022). A Large-Scale Genome-Wide Gene-Gene Interaction Study of Lung Cancer Susceptibility in Europeans With a Trans-Ethnic Validation in Asians. Journal of Thoracic Oncology, 17(8), 974–990. https://doi.org/10.1016/j.jtho.2022.04.011