Skip to main navigation Skip to search Skip to main content

Prediction of Lung Cancer Risk Through Machine Learning Based on Lifestyle Questionnaire Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Lung cancer remains the leading cause of cancerrelated deaths worldwide, and early detection is crucial for improving treatment outcomes and survival rates. Hence, the pursuit of alternative methods for early identification of lung cancer should be a significant consideration, especially in low-resource settings. This work presents an alternative for lung cancer risk recognition based on machine learning. We leveraged two publicly available datasets of lifestyle questionnaires from Kaggle, which were collected, processed, and analyzed to develop predictive models. The data was balanced using the SMOTE technique, and seven classifiers were evaluated: Support Vector Classifier, Logistic Regression, Decision Tree, Random Forest, XGBoost, Stochastic Gradient Descent, and Artificial Neural Networks. The best results were obtained with XGBoost, optimized using the GridSearchCV method. We allocated 70% of the data for training and reserved the remaining 30% for testing. The results obtained by the model for the first dataset included accuracy and F1 score of 96.50%, with precision and sensitivity of 96.51%. For the second dataset, an accuracy of 95.83%, F1 score of 95.83%, precision of 96.27%, and sensitivity of 95.83% were achieved. Moreover, using LIME for local interpretability, we were able to identify the primary influence of unhealthy behaviors such as alcohol consumption, smoking, and obesity on the model's predictions, enhancing our understanding of the factors driving the risk of lung cancer in these datasets.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350378115
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024 - Santiago, Chile
Duration: 20 Oct 202423 Oct 2024

Publication series

Name2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024

Conference

Conference2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024
Country/TerritoryChile
CitySantiago
Period20/10/2423/10/24

Keywords

  • classification
  • explainability
  • exploratory data analysis
  • lifestyle
  • lung cancer
  • machine learning

Fingerprint

Dive into the research topics of 'Prediction of Lung Cancer Risk Through Machine Learning Based on Lifestyle Questionnaire Data'. Together they form a unique fingerprint.

Cite this