Prediction of Lung Cancer Risk Through Machine Learning Based on Lifestyle Questionnaire Data

Samir Chavez-Caceres, Angel Hincho-Jove, Eveling Castro-Gutierrez, Aurea Soriano-Vargas

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Lung cancer remains the leading cause of cancerrelated deaths worldwide, and early detection is crucial for improving treatment outcomes and survival rates. Hence, the pursuit of alternative methods for early identification of lung cancer should be a significant consideration, especially in low-resource settings. This work presents an alternative for lung cancer risk recognition based on machine learning. We leveraged two publicly available datasets of lifestyle questionnaires from Kaggle, which were collected, processed, and analyzed to develop predictive models. The data was balanced using the SMOTE technique, and seven classifiers were evaluated: Support Vector Classifier, Logistic Regression, Decision Tree, Random Forest, XGBoost, Stochastic Gradient Descent, and Artificial Neural Networks. The best results were obtained with XGBoost, optimized using the GridSearchCV method. We allocated 70% of the data for training and reserved the remaining 30% for testing. The results obtained by the model for the first dataset included accuracy and F1 score of 96.50%, with precision and sensitivity of 96.51%. For the second dataset, an accuracy of 95.83%, F1 score of 95.83%, precision of 96.27%, and sensitivity of 95.83% were achieved. Moreover, using LIME for local interpretability, we were able to identify the primary influence of unhealthy behaviors such as alcohol consumption, smoking, and obesity on the model's predictions, enhancing our understanding of the factors driving the risk of lung cancer in these datasets.

Idioma originalInglés
Título de la publicación alojada2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798350378115
DOI
EstadoPublicada - 2024
Publicado de forma externa
Evento2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024 - Santiago, Chile
Duración: 20 oct. 202423 oct. 2024

Serie de la publicación

Nombre2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024

Conferencia

Conferencia2024 IEEE International Conference on Automation/26th Congress of the Chilean Association of Automatic Control, ICA-ACCA 2024
País/TerritorioChile
CiudadSantiago
Período20/10/2423/10/24

Huella

Profundice en los temas de investigación de 'Prediction of Lung Cancer Risk Through Machine Learning Based on Lifestyle Questionnaire Data'. En conjunto forman una huella única.

Citar esto