Vocal Gist
DOI:
https://doi.org/10.62647/IJITCE2025V13I3PP52-58Keywords:
YouTube Video Summarization, Transcript Extraction, youtube_transcript_api, Google Generative AI, Gemini 2.0 Flash, AI Text Summarization, Streamlit Web App, Multilingual Translation, Text FileAbstract
This project introduces "Vocal Gist," a web-based application designed to streamline the process of extracting and synthesizing information from YouTube videos. Addressing the challenge of time-consuming video consumption and the difficulty in quickly grasping key content, Vocal Gist provides an efficient solution for users seeking concise insights.
The application functions by first extracting the full transcript from a given YouTube video URL using the youtube_transcript_api. This raw text is then processed by a Google Generative AI model (gemini-2.0-flash), which intelligently summarizes the content into detailed, bullet-pointed notes, typically within 250 words. Furthermore, to enhance accessibility and usability, Vocal Gist offers the functionality to translate these generated summaries into various languages and allows users to download the notes as a plain text file for offline reference or further use.
Developed using Python, with Streamlit for the intuitive graphical user interface and google-generativeai for leveraging advanced AI capabilities, Vocal Gist significantly reduces the effort required to glean essential information from video content. It serves as a valuable resource for students, researchers, content creators, and anyone requiring quick, multilingual access to video summaries.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.