Sight GPT
DOI:
https://doi.org/10.62647/Keywords:
Sight GPT, visually impaired users, multimodal interface, speech interaction, haptic feedback, computer vision, natural language processing, image segmentation, background removal, accessibility, AI-assisted visualization, smartphone applications.Abstract
Sight GPT is a web-based application designed to assist visually impaired users in obtaining detailed information about images captured with their smartphones. The platform enables interactive exploration of photographs and the surrounding visual environment through a multimodal interface that combines speech input, haptic feedback, and visual cues. Primarily, Sight GPT aims to help visually impaired users mentally visualize content they cannot perceive, while also accommodating users with partial vision. The system integrates advanced Natural Language Processing (NLP) and Computer Vision techniques to provide precise answers to user queries about on-screen images. Image analysis is further enhanced by background removal and segmentation models, which isolate and emphasize salient elements of the scene. Preliminary evaluations demonstrate the effectiveness of the application in improving accessibility and user experience, marking an initial step toward leveraging AI-powered multimodal systems to support visually impaired individuals.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 A Hima Bindu, Anpa Chandana, Kadam Geethanjali, Challa Manumitha (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.











