TY - JOUR

T1 - Over-sampling for data augmentation in data-driven models for the shear strength prediction of RC membranes

AU - Bedriñana, Luis Alberto

AU - Landeo, Jostin Gabriel

AU - Sucasaca, Julio Cesar

AU - Málaga-Chuquitaype, Christian

N1 - Publisher Copyright:
© 2024 Institution of Structural Engineers

PY - 2024/2

Y1 - 2024/2

N2 - Complex reinforced concrete (RC) structures are generally assessed as a group of individual membrane elements subjected to in-plane combined stresses; however, an accurate prediction of the shear strength of such elements is still a complex task. In addition, the limited availability of experimental data of RC panels, which also presents an unbalanced statistical distribution towards lower strength values, limits the development of data-driven models. Thus, it is crucial to explore data augmentation techniques with a view to supporting the development of more accurate and generalizable predictive models in structural engineering. This paper evaluates over-sampling techniques for data augmentation and their use in the creation of an explainable, data-driven model for the shear strength prediction of RC panels. A dataset of 195 experimental tests of RC panels under different loading conditions is initially collected. Five over-sampling techniques are implemented to extend the original dataset and to reduce the imbalance. Three ensemble models (Random Forest, AdaBoost, and XGBoost) are trained with each of the generated datasets. It is observed that all the over-sampling techniques produced predictive models with better performance than the original dataset; however, the results show that by applying the Random Over-Sampling (ROS) the performance metrics of the model can significantly increase (around 39% for some metrics) compared to the model with the original dataset, without any overfitting issues. This strategy allowed to develop an accurate XGBoost model (with a value of R2 = 0.97 for the testing set). The explainability of the final predictive model (XGBoost model obtained from ROS) is evaluated using the SHAP (SHapley Additive exPlanations) analysis. The proposed predictive model outperformed traditional mechanics-based models (improvement of approximately 27% over SMCS and 33% over MCFT for some performance metrics) and with a more controlled error distribution over the range of variables. The proposed model was also more accurate (mean prediction ratio of 0.98) than sophisticated finite element analysis (mean prediction ratio of 0.84) for six specimens of the original dataset.

AB - Complex reinforced concrete (RC) structures are generally assessed as a group of individual membrane elements subjected to in-plane combined stresses; however, an accurate prediction of the shear strength of such elements is still a complex task. In addition, the limited availability of experimental data of RC panels, which also presents an unbalanced statistical distribution towards lower strength values, limits the development of data-driven models. Thus, it is crucial to explore data augmentation techniques with a view to supporting the development of more accurate and generalizable predictive models in structural engineering. This paper evaluates over-sampling techniques for data augmentation and their use in the creation of an explainable, data-driven model for the shear strength prediction of RC panels. A dataset of 195 experimental tests of RC panels under different loading conditions is initially collected. Five over-sampling techniques are implemented to extend the original dataset and to reduce the imbalance. Three ensemble models (Random Forest, AdaBoost, and XGBoost) are trained with each of the generated datasets. It is observed that all the over-sampling techniques produced predictive models with better performance than the original dataset; however, the results show that by applying the Random Over-Sampling (ROS) the performance metrics of the model can significantly increase (around 39% for some metrics) compared to the model with the original dataset, without any overfitting issues. This strategy allowed to develop an accurate XGBoost model (with a value of R2 = 0.97 for the testing set). The explainability of the final predictive model (XGBoost model obtained from ROS) is evaluated using the SHAP (SHapley Additive exPlanations) analysis. The proposed predictive model outperformed traditional mechanics-based models (improvement of approximately 27% over SMCS and 33% over MCFT for some performance metrics) and with a more controlled error distribution over the range of variables. The proposed model was also more accurate (mean prediction ratio of 0.98) than sophisticated finite element analysis (mean prediction ratio of 0.84) for six specimens of the original dataset.

KW - In-plane stresses

KW - Machine Learning

KW - Membrane element

KW - Nonlinear finite element

KW - Random over-sampling

KW - Reinforced concrete panels

KW - Shear strength

KW - SMOTE

UR - http://www.scopus.com/inward/record.url?scp=85182634996&partnerID=8YFLogxK

U2 - 10.1016/j.istruc.2024.105870

DO - 10.1016/j.istruc.2024.105870

M3 - Article

AN - SCOPUS:85182634996

SN - 2352-0124

VL - 60

JO - Structures

JF - Structures

M1 - 105870

ER -