Download PDF Open PDF in browser Current version

Leveraging Transfer Learning for Voice Cloning in Bengali Language

EasyChair Preprint 15582, version 1

Versions: 12→history

14 pages•Date: December 16, 2024

T Taruneshwaran, Sanjay Chidambaram, Parthvi Manoj, S Sakthi Swaroopan and G Jyothish Lal

Abstract

Voice cloning was an incredible innovation in the field of AI, which replaced human-machine interaction. Unlike other conventional voice synthesis methods, voice cloning requires the least amount of data in order to re-create the voice of a speaker and can offer personalized options for communication. But the creation of such small and powerful models working with scarce voice samples remains a big challenge. This is even challenging, especially for low-resource languages like Bengali, due to the scarcity of data itself apart from the intricacy of regional accents. Our study looks into voice cloning for Bengali using a transfer learning technique from speaker verification models. In this study we have adapted the model for Bengali using Mozilla Common Voice Bengali dataset with the SV2TTS framework. This dataset contains voices ranging in a wide variety of accents and dialects. Retraining the encoder, synthesizer, and vocoder components to capture the unique phonetic features specific to Bengali allows our approach to generate realistic, high-quality voice replications. It is evident from the results, as obtained by evaluation using the Mean Opinion Score method, that the cloned voices turn out very natural and similar in likeness to the speaker. These findings demonstrate prowess for under-resourced languages and extend into customized communications, voice acting, and speech-based assistive tools. This research is focused on the development of methods and models for Bengali speech processing to tackle challenges associated with low-resource language processing; further advances in Bengali speech technologies stand on such bedrock.

Keyphrases: AI-driven, Bengali, Bengali language, Mean Opinion Score, Mel Spectrograms, Mel-spectrogram, Speaker specific, Voice Cloning, Voice replication, acoustics speech and signal processing, automatic speech recognition and translation, bengali synthesis, cloning with a few samples, encoder synthesizer and vocoder, encoder synthesizer and vocoder components, large bengali speech recognition dataset, low resource languages like bengali, low-resource language, mozilla common voice bengali dataset, predicted vs target mel, recognition and translation for low resource, speaker encoder, speaker s unique voice features, speaker s voice, speaker text to speech synthesis, speaker verification to multi speaker, speech synthesis, text-to-speech, tts models and vocoder combinations, verification to multi speaker text, voice cloning for bengali, vs target mel spectrogram

Links:

https://easychair.org/publications/preprint/pxmX

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15582,
  author    = {T Taruneshwaran and Sanjay Chidambaram and Parthvi Manoj and S Sakthi Swaroopan and G Jyothish Lal},
  title     = {Leveraging Transfer Learning for Voice Cloning in Bengali Language},
  howpublished = {EasyChair Preprint 15582},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser Current version