Fine Tuning and Comparing Tacotron 2, Deep Voice 3, and FastSpeech 2 TTS Models in a Low Resource Environment

T. Gopalakrishnan, Syed Ayaz Imam, Archit Aggarwal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Text-to-speech (TTS) models are used to generate speech from a sequence of characters provided as input. Existing TTS systems require a high-quality large dataset and vast computational resources for training. However, most of the publicly available datasets do not meet such standards, and access to powerful GPUs may not always be possible. Hence, in our work, we have successfully trained and compared TTS models, specifically Tacotron 2, FastSpeech 2, and Deep Voice 3 on a Tesla T4 GPU using a subset of the LJSpeechl.1 dataset. Subsequently, we have surveyed to analyze the performance of the models when trained on small datasets, and we discovered that the Tacotron 2 TTS model synthesized the most realistic sounding speeches. The survey revealed that the Tacotron 2 TTS model achieved a mean opinion score (MOS) at a 95% confidence interval of 4.25± 0.17, and sounded the most natural to our listeners when compared to the ground truth.

Original languageEnglish
Title of host publicationIEEE International Conference on Data Science and Information System, ICDSIS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665498012
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Conference on Data Science and Information System, ICDSIS 2022 - Hassan, India
Duration: 29-07-202230-07-2022

Publication series

NameIEEE International Conference on Data Science and Information System, ICDSIS 2022

Conference

Conference2022 IEEE International Conference on Data Science and Information System, ICDSIS 2022
Country/TerritoryIndia
CityHassan
Period29-07-2230-07-22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems
  • Safety, Risk, Reliability and Quality
  • Water Science and Technology
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Fine Tuning and Comparing Tacotron 2, Deep Voice 3, and FastSpeech 2 TTS Models in a Low Resource Environment'. Together they form a unique fingerprint.

Cite this