VocV1(VogenVoc): Audio Samples

Demo page for paper “A Fast High-Fidelity Source-Filter Vocoder with Lightweight Neural Modules

 

Abstract The quality of raw audio waveform generated by a vocoder could affect various audio generative tasks. In recent years, the dominance of source-filter vocoders was greatly challenged by neural vocoders as the latter presents far superior synthesized audio quality. Meanwhile, neural vocoders introduced unprecedented limitations including low runtime efficiency as well as unstable pitch especially in those without explicit periodic excitation input, while these have never been a problem in source-filter vocoders. We present in this paper a novel approach that takes the best from both parties. We start by an in-depth examination of every building block in WORLD – one of the best-performing source-filter vocoders based on plain signal processing algorithms, looking for ones that do not work well, and we replace them with small, lightweight and task-specific neural network models. We also rearranged the vocoding pipeline for a smoother collaboration between building blocks. Our objective and subjective evaluations demonstrate that our methods present competitive synthesized audio quality even when compared against neural vocoders at a much lower computational cost, while keeping spectral envelope acoustic feature, high pitch accuracy as in conventional source-filter vocoders.

 

Synthesis Pipeline
Figure showing synthesis pipeline used in this work. Red-orange means complex-valued; blue-gray means real-valued.

 

Time Performance Comparison
Figure showing runtime performance comparison among vocoders under various hardware settings. Numbers are time cost in milliseconds per synthesis of a 1-second audio. All tests were run on a single CPU thread.

Use headphones for best experience

Female Singers

BA KGZW 057+7

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

DA SXQN 004+7

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

EU CCNN 058+8

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

LC FHCB 031+8

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

NH JQ 000+7

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

CY BSJXGG 243+6 (Unseen vocal range)

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet
Use headphones for best experience

Male Singers

AX SN 046+6

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

DH BMG 045+7

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

DZ CSY 000+6

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

SS AYC 051+7

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet

TS TZXS 203+8

Reference
WORLD
Ours
SiFi-GAN
HiFi-GAN
UnivNet
Use headphones for best experience