VISION TRANSFORMER FOR IMAGE CLASSIFICATION USING KB DATASET

Dr.B.GNANA PRIYA

Authors

Dr.B.GNANA PRIYA Author

Keywords:

Vision Transformer, KB dataset, Image Classification

Abstract

Image classification has witnessed remarkable advancements with the emergence of Vision Transformers (ViTs), which leverage self-attention mechanisms to capture global dependencies in image data. This study explores the application of a Vision Transformer for classifying the KB dataset, which comprises 20 diverse image classes. The KB dataset presents unique challenges due to its class diversity and inter-class similarities, making it an ideal benchmark for evaluating the performance of transformer-based architectures. We outline a comprehensive workflow, including data preprocessing, model architecture design, and fine-tuning of pretrained ViT models. Our results demonstrate the effectiveness of Vision Transformers in achieving high classification accuracy while maintaining robustness to noisy and complex patterns in the dataset. Comparative analyses with convolutional neural networks (CNNs) reveal the superior generalization capabilities of ViTs for this multi-class classification task. This work underscores the potential of ViTs in advancing image classification for challenging datasets and highlights avenues for further research in their optimization and scalability.