Improving Fake News Detection Through Word Embedding-Based Text Augmentation and XGBoost Ensemble Learning
DOI:
https://doi.org/10.62647/Keywords:
Fake News Detection, Text Data Augmentation, Word Embedding, Word2Vec Skip-gram, Back Translation, Synonym Replacement, Function Word Reduction, Machine Learning Classifiers, Natural Language Processing, WEL Fake News Dataset.Abstract
By combining state-of-the-art data augmentation, ensemble learning, and deep learning approaches, the extended system for classifying fake news hopes to conquer the obstacles presented by small text datasets and intricate language patterns. In order to increase variety and decrease overfitting, text augmentation strategies including Function Word Reduction, Synonym Replacement, and Back Translation are utilized to broaden the training corpus. To further improve the capture of significant word associations and contextual information, the Word2Vec Skip-gram model is used to convert the enhanced text into numerical representations that are rich in semantic information.
The enhanced system improves classification performance by combining deep learning architectures that offer strong feature extraction and better decision-making skills with powerful ensemble algorithms including LightGBM, CatBoost, and XGBoost. By absorbing intricate patterns that more conventional ML models miss, these models dramatically improve accuracy, precision, recall, and F1-score. More than that, a graphical user interface built on Flask is created to give people an easy-to-use platform where they can input news stories and have them categorized instantly. An accurate, scalable, and practically useful solution for trustworthy false news detection across various digital platforms is provided by the extended system through the integration of augmentation, advanced modeling, and user-centered design.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Dr Sheik Meerasharif, Gubbala Madhu Bhushan (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.











