VoiceBridge: An AI-powered framework for low-cost multilingual video dubbing into Indian regional languages

M. V. Madhusudhan; Annmary Jojo; D. Aqsa Mehreen; H. S. Agampreeth

Published

VoiceBridge: An AI-powered framework for low-cost multilingual video dubbing into Indian regional languages

,

,

,

Published in February 23, 2026 (Vol. 20, Issue 1, 2026)

VoiceBridge: An AI-powered framework for low-cost multilingual video dubbing into Indian regional languages - Issue cover

Keywords

multilingual artificial intelligence word error rate character error rate accuracy whisper model

Abstract

VoiceBridge is an AI-powered, low-cost multilingual dubbing framework specifically built to make English video content available in various Indian regional languages, specifically South Indian languages: Kannada, Tamil, Telugu, and Malayalam. VoiceBridge combines bleeding- edge open-source technologies such as Whisper for Automatic Speech Recognition (ASR), IndicTrans2 for Text Translation (TT), and Coqui or Indic-TTS for Text-to-Speech (TTS), to create an end-to-end pipeline of transcription, translation, speech synthesis, and video dubbing that is affordable, culturally relevant, and easily scalable. The framework features a simplified interface that allows users to upload videos, translate speech, and produce dubbed outputs without having to have any background knowledge of the processes or technologies being used. Evaluation of performance gave promising results, arriving at a Word Error Rate (WER) of 11.9% and Character Error Rate (CER) of 11.09%, showing significant levels of recognition and translation accuracy despite minor differences in pronunciation and a 90millisecond audio-video delay. VoiceBridge utilizes open-source models and adapts those for low-resource languages to serve as abridge towards mitigating the digital languages gap, and as a means to provide access to educational and informational content in video format to various linguistic communities.

References

[1]YouTube in education, Wikipedia, [online]. Available: https://en.wikipedia.org/wiki/YouTube_in_education.Accessed:2,2025.
[2] UNESCO, UNESCO survey highlights measures taken by countries to limit impact of COVID-19 school closures, UNESCO, 2023. [Online]. Available: https://www.unesco.org/en/articles/unesco-survey-highlights-measures-taken-countries-limit-impact-covid-19-school-closures.Accessed:Sep.2,2025.
[3]R. K. Nale, S. Bagal, H. Bhoite, S. Ghadge and S. Mohite, Text translation for English education videos into regional languages, International Research Journal of Modernization in Engineering Technology andScience 6(10) (2024). [Online]. Available: https://doi.org/10.56726/IRJMETS62629.
[4]Usage statistics of content languages for websites, Apr. 2022. [Online]. Available: https://w3techs.com/technologies/overview/content language.
[5]L. Moses, Bridging the digital language divide: policy and innovation, Digital Futures J. 7(2) (2023), 45-62.
[6]Census of India, C-17 population by bilingualism and trilingualism, 2011. [Online]. Available: https://web.archive.org/web/20191113211224/http://ww.censusindia.gov.in/2011census/C-17.htnl.
[7]H. Sheth, India’s active internet population likely to reach 900 million by 2025. Report, June 2021. [Online]. Available: https://www.thehindubusinessline.com/info-tech/indias-active-internet-population likely-to-reach-900-million-by-2025-report/article34714569.ece.
[8] V. Venkataraghavan, S. Sivapatham and A. Kar, Wav2Lip bridges communication gap: Automating lip sync and language translation for Indian languages, IEEE Access, 11 (2023), pp. xxxx-xxxx. doi: 10.1109/ACCESS.2023.xxxxx.
[9]A. Mahaganapathy and K. Sarveswaran, A survey and evaluation of text-to-speech system for the Tamil language, Natural Language Processing Journal 12 (2025), p. 100171. [Online]. Available: https://doi.org/10.1016/i.nlp.2025.100171.
[10]VoiceBridge: An AI-powered Framework for Low-cost Multilingual Video ... 19
[11]B. Meenakshi, M. W. Hussain and M. A. Sai, Real-time multilingual speech translation for peer communication, International Research Journal on Advanced Engineering Hub (IRJAEH), Vol. 2893 2025. [Online]. Available: https://www.researchgate.netpublication/393081843_Real-Time_Multilingual_Speech_Translation_for_ Peer_Communication.
[12]V. V. Vijayabhaskarareddy, B. V. Venkata Prasad, B. Ramesh, G. Arvind and K. Rakesh, AI enhancedvideo language translation, IOSR Journal of Computer Engineering (IOSR-JCE) 27(1) (2025), 49-55. [Online]. Available: https://www.iosrjournals.org/iosr-jce/papers/Vol127-issue/Ser-2/G2701024955.pdf.
[13]S. K. Pulipaka, C. K. Kasaraneni, S. S. M. Kosaraju and V. N. S. Vemulapalli, Machine translation of english videos to indian regional languages using open innovation, International Journal of Computer Applications 175(1-5) (2019). [Online]. Available: https://www.researchgate.net/publication/338177583Machine Translation of English Videos to Indian Regional Languages using Open Innovation.
[14] A. Dasare and K. T. Deepak, Performance assessment of voice conversion models using speech production-based parameters, Comput. Speech Lang. 95 (2025), 101853. [Online]. Available: https://doi.org/10.1016/j.csl.2025.101853.
[15]S. Bano, P. Jithendra, G. L. Niharika and S. Yalavarthi, Speech to Text Translation enabling Multilingualism, Proc. 2022 IEEE Int. Conf. Innov. Technol. (INOCON), Bengaluru, India, 2022, pp. 1-5. doi: 10.1109/INOCON50539.2022.9298280.
[16]R. Kannojia, A. K. Singh, I. Sharma and S. Gupta, Gen AI driven multilingual audio dubbing and synthesis system for cross-language video platforms, Bohrium, 2025. [Online]. Available: https://www.bohrium.com/paper-details/gen-ai-driven-multilingual-audio-dubbing-and-synthesis-system-for-cross-language-video-platforms/1152611458563964934-64194
[17]R. Kannojia, A. K. Singh, I. Sharma and S. Gupta, Gen AI driven multilingual audio dubbing and synthesis system for cross language video platforms, ScienceDirect/Elsevier, 2025. [Online]. Available: https//www.sciencedirect.com/.
[18]X. Liu, M. Chen and Y. Zhao, TTS: Multi-modal text-to-speech of multi-scale style control for dubbing ScienceDirect/Elsevier, 2024. [Online]. Available: https://www.sciencedirect.com/.
[19] S. Kumar, L. Wang and D. Patel, Advancements in End-to-End Audio Style Transformation, MDPI, 20024. [Online]. Available: https://www.mdpi.com/.
[20]H. Zhang, P. Mehta and R. Srinivasan, Seeing the Sound: Multilingual Lip Sync for Real- Time Face Generation, MDPI, 2023/2024. [Online]. Available: https://www.mdpi.com/.
[21]J. Lee and K. Park, Audio-Driven Talking Face Generation with Stabilized Lip Movement, SpringerLink, 2024. [Online]. Available: https://link.springer.com/.
[22]V. Reddy, N. Sharma and A. Bose, Generating dynamic lip-syncing using target audio in a multimedia system, ScienceDirect, 2024. [Online]. Available: https://www.sciencedirect.com/.
[23] F. Zhao, T. Chen and W. Hu, Audio-visual speech synthesis using vision transformer-enhanced networks, SpringerLink, 2024. [Online]. Available: https://link.springer.com/.
[24]L. Singh, A. Roy and J. Kim, Speech driven video editing via an audio-conditioned diffusion model, ScienceDirect, 2024. [Online]. Available: https://www.sciencedirect.com/.
[25]D. Verma and S. Tripathi, Perceptual Evaluation of Audio-Visual Synchrony Grounded in Deep Learning, SpringerLink, 2024. [Online]. Available: https://link.springer.com/.

Authors (4)

M. V. Madhusudhan

ProfessorProfessorProfessorProfessor

View all publications →

Annmary Jojo

ProfessorProfessorProfessorProfessor

View all publications →

D. Aqsa Mehreen

ProfessorProfessorProfessorProfessor

View all publications →

H. S. Agampreeth

ProfessorProfessorProfessorProfessor

View all publications →

Download Article

PDF

Best for printing and citation

File size: 4.5 MB

Format: PDF

Download Article

PDF

Best for printing and citation

File size: 4.5 MB

Format: PDF

Article Information

Published in:

February 23, 2026 (Vol. 20, Issue 1, 2026)

Article ID:: ACSE1200001
Paper ID:: ACSE-01-000001
Pages:: 1-20
Published Date:: 2026-03-16

Article Impact

Downloads:647

How to Cite

Citation Format

V., M., & Jojo & Aqsa, D. & S., H. (2026). VoiceBridge: An AI-powered framework for low-cost multilingual video dubbing into Indian regional languages. Advances in Computer Science and Engineering, 20(1), 1-20. https://acse.scholarjms.com/articles/1

Advances in Computer Science and EngineeringAdvances in Computer Science and Engineering

VoiceBridge: An AI-powered framework for low-cost multilingual video dubbing into Indian regional languages

Keywords

Abstract

References

Authors (4)

M. V. Madhusudhan

Annmary Jojo

D. Aqsa Mehreen

H. S. Agampreeth

Download Article

PDF

PDF

Download Article

PDF

PDF

Article Information

Article Impact

How to Cite

Article Actions