Sarcasm Detection is a particularly complex setback in Natural Language Processing. In the last years, there have been improvements in the architecture and functionality of models that try to solve the problem. However, the importance of the sarcastic texts used to train the models has been left aside, as well as their collection methods. The traditional methods generate biased and noisy datasets with errors, and do not differentiate the two types of sarcasm: intentional and perceived. In consequence, the current investigation does a quantitative analysis on the impact that collection methods of sarcastic datasets in English have on sarcasm detection models. For that purpose, the investigation uses public datasets and generates two new datasets with the Reactive Supervision method (Shmueli et al., 2020) to analyze the impact of the collection methods on the performance of sarcasm detection models. It makes a detailed comparison of the methods, training state-of-the-art models with a representative dataset of each one of them. The results suggest that it is possible to obtain better models using a method that provides a clean dataset and the type of sarcasm to be detected. At the same time, they confirm the findings made by previous investigations and open a path to future works.
- Natural Language Processing
- Sentiment Analysis
- Sarcasm Detection
- DATA extraction
- Sarcasm
Análisis de los métodos de recolección de textos sarcásticos
Velasquez Gushiken, A. (Author). 2023
Student thesis: Tesis de Pregrado