Feature selection influence on machine-learning-based classifiers of the shear failure mode of PC girders

Jhon Tovar, Luis Alberto Bedriñana, Christian Málaga-Chuquitaype

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

The shear failure of prestressed concrete (PC) girders is a complex problem due to the numerous influencing parameters. The shear failure mode is directly related to the shear capacity of PC girders, yet conventional shear models cannot directly predict it. Recently, Machine Learning (ML) methods have been applied to such problems in Structural Engineering; however, there is no clear consensus on the optimal quantity and type of input features required to develop an efficient ML classifier. This paper examines the influence of different feature selection techniques on the performance of ML classifiers in predicting the shear failure mode of PC girders. In addition, this paper also presents and discusses a framework for developing an explainable data-driven model for the shear failure mode classification of PC girders through optimal feature selection. To this end, a comprehensive dataset of 668 experimental tests of PC girders is assembled. Wrapped (Forward, Backward, Recursive Feature Elimination, and Exhaustive Selection) and Filter (ANOVA F-test and Correlation Clustering) methods are applied to identify the most relevant subset of input features. The selected features from each method are then used to train different ML models (based on Random Forest, XGBoost, and AdaBoost) to obtain an efficient ML classifier with an optimal number of input features. A classifier trained with the full set of available features is also used for comparison. Most of the evaluated methods required around 5–10 features to maintain an adequate performance. Moreover, all the ML models trained with the optimal number and combinations of features, as produced by the different feature selection methods, achieved a higher performance (F1_score above 0.83) than the classifier trained with the full set of features (F1_score = 0.82). However, Filter methods showed better performance than Wrapped methods, with less computational expense. It was also noted that the feature selection methods that provided the best performance were that ones that not only reduced irrelevant features but also chose features that represent important aspects of the problem. Among the evaluated models, Correlation Clustering (CC) provided the most accurate ML classifier (Accuracy = 0.851 and F1_score = 0.851) using just 8 input features (around 50 % of the total available features). Lastly, an explainability analysis of the selected ML model (based on the CC method) highlighted the importance of identifying the most important variables before training ML classifiers. This work provides a reference for engineers to select, compare, and validate feature selection methods for classification problems in Structural Engineering.

Idioma originalInglés
Número de artículo109746
PublicaciónStructures
Volumen80
DOI
EstadoPublicada - oct. 2025

Huella

Profundice en los temas de investigación de 'Feature selection influence on machine-learning-based classifiers of the shear failure mode of PC girders'. En conjunto forman una huella única.

Citar esto