clvp dev

by susnato

Open source · 105k downloads · 0 likes

0.0

(0 reviews)EmbeddingAPI & Local

About

The CLVP Dev model is a key component of the Tortoise-TTS speech synthesis system, designed to enhance the quality of speech generation. It is based on an architecture inspired by CLIP but employs two distinct encoders: one for processing text tokens and another for MEL tokens, which represent the spectral characteristics of the audio signal. This approach ensures a better alignment between the text and the generated voice, yielding more natural and expressive results. Its primary use cases include creating voiceovers, generating dialogue for virtual characters, or producing audio content from text. What sets it apart is its ability to finely capture the nuances of language while maintaining prosodic consistency, thanks to the interaction between the two encoders.

Documentation

DISCLAIMER : I do not own any weights present in this repository. All weights belong to the author of the paper - "Better speech synthesis through scaling", James Betker . I am storing the weights(temporarily) for the tortoise-tts integration to Huggingface. Please refer to this PR to know more.

About

CLVP model is an integral part of tortoise-tts presented in the paper - "Better speech synthesis through scaling" by James Betker. CLVP uses an architecture similar to the CLIP text encoder, except it uses two of them: one for text tokens and the other for MEL tokens.

Capabilities & Tags

transformerspytorchclvpfeature-extractionendpoints_compatible

Links & Resources