Using Text Classification for Identifying Harmful Language on Social Media
DOI:
https://doi.org/10.62647/Abstract
Worryingly, foul language is becoming more common in crowdsourced material across different social media sites. To use such rhetoric is to potentially intimidate or offend someone or some group. Researchers have been looking at automatic speech detection and prevention for some time now, and they've produced a variety of supervised approaches and training datasets. Our proposed architecture for text categorization in this work includes eight classifiers, three embedding approaches, a modular cleaning step, and a tokenizer. The results of our studies on the dataset we received from Twitter for the purpose of detecting inflammatory language are encouraging. The three AdaBoost, SVM, and MLP algorithms achieved the greatest average F1-score on the popular TF-IDF embedding approach when hyperparameter tuning was taken into account.
Index Terms—offensive language detection, social media, machine learning, text mining
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.