An Automated Framework for Summarizing YouTube Videos Using NLP

. In recent times, YouTube has increasingly become the preferred platform to consume educational content. In order to learn complex and intricate concepts, a student must sit through many of hours of YouTube videos where an average video length is about 20 minutes. To see if the content of a given YouTube video is relevant to what the user is looking for, YouTube Video Summarizer was conceptualized. YouTube Video Summarizer is a Chrome Extension tool which can be used to quickly generate the summary of a YouTube video using the English-language transcript of the video Automation. This allows for a seamless generation of a synopsis without spending hours watching the content to determine its relevancy.


Introduction
YouTube is an online video-sharing platform founded by Chad Hurley in 2005.It is the second most visited website, right behind Google search.Since its inception, it has been frequently used as an educational platform with a goal of teaching millions of students across the globe.
YouTube Video Frame Summarizer [1]- [4] generates the summary of a YouTube video by extracting chunks of information and piecing it together through an NLP model using the automatically generated English-language transcript provided with the video.

Rationale
The intention behind developing this project was to make YouTube videos easier to understand at a glimpse by providing a gist of the content and then leaving it to the user to decide whether the content is relevant and informative.This makes the tedious process of scouring through a long video for information easier.

Goal
The goal of the project is to output a concise, grammatically correct summary of a given YouTube video in the form of text which indicates the relevancy of the content without the need to watch the video.Relevancy is deduced by the end user based on what kind of content they are looking for.

Literature Review
Web is the most frequently used networking aid which satisfies the requirements of all types of users; it provides a solution for any type of problem definition.While developing a web portal the appearance of web portal makes a development more critical.The good appearance of a web can easily attract more number of visitors which is a success of web portal.For designing and developing such well structured and with the good appearance of web we have to choose a proper technology.The technological needs of satisfying a good web portal can be fulfilled by "python" and "flask" [1].
Youtube has now became one of the biggest platforms for entertainment , study , cooking , and many more stuff like that.Although the user base varies from young to old, YouTube is most popular among young people who prefer the variety of content, interactive features and instant gratification of YouTube video content to regular television.Many use it for entertainment purposes, to learn how to do something (teaching), to keep up with the latest music videos from their favourite artists, and more.YouTube is available in almost all countries and in more than 50 languages.Like Google, all you need to create and use a YouTube account is a Google account.But the only problem is at times it happens that we do not have that much amount of free time to watch a particular video .So , that is where this paper comes into the picture which allows us to provide a short summary of the youtube video which not only saves our time but also gives us crisp content which we can refer to while writing a summary or a synopsis without wasting large amount of time [2].

Python
Python was initially developed by Guido van Rossum, and first implemented in December 1989.Van Rossum was the lead developer for the project until 12th July 2018.Python is the primary programming language for this project.The local webserver and the core summarizing logic were developed in Python using Flask [5]- [7] and HuggingFace Transformers [8] library.

a. Machine Learning
Machine Learning [9]

i. HuggingFace Transformers
HuggingFace Transformers [8], [10] is a collection of APIs and tools to download state-ofthe-art pretrained NLP models.This project utilized the T5 [11], [12] model, developed by Google.The T5 is a text-to-text transformer that has a variety of features that can be applied to a text like translation, summarization etc.It is an encoder-decoder type model.There are 5 sizes of T5 -t5-small, t5-base, t5-large, t5-3b, t5-11b.The t5-base model was used in this project for the purpose of transcript summarization.

ii. Preprocessing
The practice for arranging actual data to be used for a Machine learning algorithm is referred to as data preprocessing.Empirical data can contain disturbance, null data, and is in an unsuitable form, making it impossible to utilize in machine learning models directly.
Preprocessing the data is an essential task for cleaning the data and making it viable for a machine learning model to process useful data which improves the model's efficacy.The steps involved in preprocessing are: • Removing punctuations like: ., ! $( ) * % @ • Removing Stop words • Tokenization

Results
To measure the accuracy of the generated summaries [13]- [20] 12 YouTube videos of various genres were randomly picked.The outputs were then measured against summaries generated by humans, which were used as reference summaries.Figure 1 illustrates the F1 Scores of individual videos selected for summarization.Selected videos are indicated by their respective numbers in Table 1.The 3 metrics being tested -ROGUE-1, ROGUE-2, ROGUE-L -have all been given F1-Scores for each video.
ROGUE is considered a good metric for estimating summarization and translation capabilities of the model.It is effective in assessing the accuracy of the summarized text, but if we have multiple sentence sequences that use synonyms to convey the same meaning, the summary is generally given a low score.ROGUE only looks for exact matches and does not take semantics into account, hence the resulting F1-Scores in Figure 1.Despite this flaw, our model outperforms other transcript summarizing models by achieving relatively high F1-Scores for baseline summarization [26].

Conclusion
Automatic Framework YouTube Video Summarizer provides users with a coherent, syntactically, and grammatically accurate summary of any YouTube video Frames with a default English-language transcript.It helps users understand the crux of the video by summarizing the words spoken throughout its duration, at a simple glance.It is able to achieve the task of summarization using the tools provided by the HuggingFace library, and with the help of the Google T5 NLP model.After fine-tuning the model for video summarization, the results achieved were greater than the average F1-Scores of alternate models and numerous other implementations of the same model.

Future Scope
There is scope for further tuning and evaluating the performance of the model and implementing the following features -• Automatic Subtitle Generation using Speech-To-Text NLP Models iii.Tokenization• Converts a sentence into collection of words, called tokens • Breaks the text into smaller portions • It discovers the meaning of the text by inspecting the words and their sequences

Fig. 1 .
Fig. 1.F1-Scores for 12 YouTube videosA technique called ROGUE[21]-[24] was utilized to evaluate the performance of the current model.The metrics tested were -ROGUE-N (1 and 2) & ROGUE-L[25].We compute the F1-Scores for each of these metrics through precision and recall values.The F1-Scores give us a valid measure of our model's performance since it depends on the model gathering essential words -recall, while avoiding irrelevant words -precision.

Table 1
consists of the list of videos whose transcripts were extracted and summarized.Column that represents Transcript Length indicates the total number of words spoken in the YouTube video.Column that represents Summary Length indicates the total number of words in the summary generated.These columns represent the efficiency of the summarization model for a given transcript length.
plays a key role.It is used in tasks like perceiving human appearances or self-driving vehicles.With the growing proportions of data, there is substantial research and findings to acknowledge that Machine Learning is now a fundamental aspect for technological progression.NLP involves two different techniques.There are syntax and semantic analysis.Syntax includes-Stemming, Parsing, Sentence Breaking, Morphological Segmentation, Word Segmentation.Semantics includes-Named Entity Recognition, Natural Language Generation, Word Sense Disambiguation.
b. Natural Language ProcessingIt is used for activities like mail spam detection, texting etc.It helps computers converse with humans.It helps in speech recognition and text analytics.NLP is a technology which can understand the human language.NLP has two phases -Data Preprocessing and Algorithm Generation.The former involves the process of cleaning the raw data and transforming it into formats that the machine can interpret.

Table 1 .
List of videos selected for summarization • Multilingual Support For Summary Generation • Deploying The Project On A Cloud Database • Displaying Additional Content Like Video Transcript, Transcript Length