Download PDFOpen PDF in browser

NU Voice Conversion System for the Voice Conversion Challenge 2018

EasyChair Preprint 103

8 pagesDate: April 28, 2018

Abstract

This paper presents the NU (Nagoya University) voice conversion (VC) system for the HUB task of the Voice Conversion Challenge 2018 (VCC 2018). The design of the NU VC system can basically be separated into two modules consisting of a speech parameter conversion module and a waveform-processing module. In the speech parameter conversion module, a deep learning framework is deployed to estimate the spectral parameters of a target speaker given those of a source speaker. Specifically, a deep neural network (DNN) and a deep mixture density network (DMDN) are used as the deep model structure. In the waveform-processing module, given the estimated spectral parameters and linearly transformed F0 parameters, the converted waveform is generated using a WaveNet-based vocoder system. To use the WaveNet-based vocoder, there are several generation flows based on an analysis-synthesis framework to obtain the speech parameter set, on the basis of which a system selection process is performed to select the best one in an utterance-wise manner. The results of VCC 2018 ranked the NU VC system in second place with an overall mean opinion score (MOS) of 3.44 for speech quality and 85% accuracy for speaker similarity.

Keyphrases: NU VC system, WaveNet vocoder, deep learning, voice conversion challenge 2018, waveform modeling

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:103,
  author    = {Patrick Lumban Tobing and Yichiao Wu and Tomoki Hayashi and Kazuhiro Kobayashi and Tomoki Toda},
  title     = {NU Voice Conversion System for the Voice Conversion Challenge 2018},
  doi       = {10.29007/csr4},
  howpublished = {EasyChair Preprint 103},
  year      = {EasyChair, 2018}}
Download PDFOpen PDF in browser