Abstrakt | Voice Conversion (VC) has gained attention due to its rapid development and increased accessibility. However, this also brings a potential threat for misuses. Consequently, it is crucial to thoroughly assess the performance of VC models. Current research, however, predominantly focuses on the evaluation of VC models on English, neglecting other languages during evaluation and focusing on one conversion scenario. To address this research gap, this paper aims to evaluate four VC models, namely kNN-VC, FreeVC, QuickVC, and RVC, on German speakers across three different conversion settings: any-to-any conversion (i.e., without fine-tuning), VC with speaker fine-tuning, and VC with language (German) fine-tuning. Additionally, we examined the influence of target speaker audio length using data ranging from 10 to 2400 seconds for generation. |
---|