Lip Reading From A Muted Video Using Ml
DOI:
https://doi.org/10.62647/Keywords:
Lip Reading, Visual Speech Recognition, Computer Vision, Machine Learning, Deep Learning, Video Processing, Lip Movement Analysis, Speech-to-Text, Human–Computer Interaction (HCI), Pattern Recognition, OpenCV, TensorFlowAbstract
In many real-world situations, audio may not always be available or usable for understanding speech. Videos recorded in noisy environments, surveillance systems, or recordings without sound make it difficult to know what a person is speaking. Lip reading from muted videos is an important research area in computer vision and machine learning that aims to recognize speech by analyzing lip movements instead of depending on audio signals. The objective of this project is to develop a system that can detect and convert speech from muted videos into text using visual information from lip movements. The system accepts a silent video as input and processes it through several stages such as video frame extraction, preprocessing, lip region detection, motion analysis, and machine learning-based prediction. By analyzing the movement patterns of the lips across multiple frames of the video, the system predicts the spoken content and converts it into readable text.
The system is developed using Python along with libraries such as OpenCV, NumPy, TensorFlow, and Streamlit. OpenCV is used for video processing and frame extraction, NumPy is used for numerical computations, and the machine learning model analyzes the visual patterns of lip movements to predict the spoken text. Streamlit is used to create a simple web interface where users can upload muted videos and view the predicted text results. This project demonstrates that speech can be recognized from visual lip movements without relying on audio signals. Such a system can be useful in applications such as silent video analysis, surveillance systems, communication support for hearing-impaired individuals, and speech recognition in noisy environments. The project also provides a foundation for future improvements using advanced deep learning models to improve accuracy and support recognition of longer and more complex speech from muted videos.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Syeda Fatima, R. Manasa, K. Sushma, T. Akshitha (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.











