Issue |
E3S Web Conf.
Volume 419, 2023
V International Scientific Forum on Computer and Energy Sciences (WFCES 2023)
|
|
---|---|---|
Article Number | 02029 | |
Number of page(s) | 8 | |
Section | Applied IT Technologies in Energy and Industry | |
DOI | https://doi.org/10.1051/e3sconf/202341902029 | |
Published online | 25 August 2023 |
Prerequisites for the development of the system of automatic comparison of video and audio tracks by the speaker’s articulation
1 Department of Computer Systems, Kazan National Research Technical University named after A. N. Tupolev – KAI, Kazan, Russia
2 Department of Automated Information Processing and Control Systems, Kazan National Research Technical University named after A. N. Tupolev – KAI, Kazan, Russia
* Corresponding author: landwatersun@mail.ru
Deep learning and reinforcement learning technologies are opening up new possibilities for the automatic matching of video and audio data. This article explores the key steps in developing such a system, from matching phonemes and lip movements to selecting appropriate machine-learning models. It also discusses the importance of getting the reward function right, the balance between exploitation and exploitation, and the complexities of collecting training data. The article emphasizes the importance of using pre-trained models and transfer learning, and the importance of correctly evaluating and interpreting results to improve the system and achieve high-quality content. The article focuses on the need to develop effective mapping quality metrics and visualization methods to fully analyze system performance and identify possible areas for improvement.
© The Authors, published by EDP Sciences, 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.