par FabioSarracino
Open source · 868 downloads · 95 likes
VibeVoice Large Q8 est un modèle d'IA optimisé pour la génération vocale qui se distingue par sa capacité à fonctionner en 8 bits tout en conservant une qualité audio parfaite, contrairement aux autres modèles quantifiés qui produisent souvent du bruit. Grâce à une technique de quantification sélective, il réduit la taille du modèle de 38 % (11,6 Go au lieu de 18,7 Go) tout en utilisant moins de mémoire vive (12 Go au lieu de 20 Go), le rendant accessible aux cartes graphiques comme les RTX 3060 ou 4070 Ti. Il excelle dans les applications nécessitant un équilibre optimal entre performance et qualité, comme la production audio professionnelle ou les environnements à ressources limitées, tout en restant compatible avec les outils comme ComfyUI. Ce modèle se positionne comme une solution fiable pour les utilisateurs cherchant à exploiter la puissance des modèles vocaux sans sacrifier la clarté du résultat final.
If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works.
The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.
Most 8-bit models you'll find online quantize everything aggressively: Result: Audio components get quantized → numerical errors propagate → audio = pure noise.
I only quantized what can be safely quantized without losing quality.
Result: 52% of parameters quantized, 48% at full precision = perfect audio quality.
| Model | Size | Audio Quality | Status |
|---|---|---|---|
| Original VibeVoice | 18.7 GB | ⭐⭐⭐⭐⭐ | Full precision |
| Other 8-bit models | 10.6 GB | 💥 NOISE | ❌ Don't work |
| This model | 11.6 GB | ⭐⭐⭐⭐⭐ | ✅ Perfect |
+1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it.
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
import scipy.io.wavfile as wavfile
# Load model
model = AutoModelForCausalLM.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
trust_remote_code=True
)
# Generate audio
text = "Hello, this is VibeVoice speaking."
inputs = processor(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=None)
# Save
audio = output.speech_outputs[0].cpu().numpy()
wavfile.write("output.wav", 24000, audio)
Install the custom node:
cd ComfyUI/custom_nodes
git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
Download this model to ComfyUI/models/vibevoice/
Restart ComfyUI and use it normally!
⚠️ Not supported: CPU, Apple Silicon (MPS), AMD GPUs
transformers>=4.51.3bitsandbytes>=0.43.0device_map="auto"pip install bitsandbytes>=0.43.0
This shouldn't happen! If it does:
pip install --upgrade transformerstorch.cuda.is_available() should return True@misc{vibevoice-q8-2025,
title={VibeVoice-Large-Q8: Selective 8-bit Quantization for Audio Quality},
author={Fabio Sarracino},
year={2025},
url={https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8}
}
@misc{vibevoice2024,
title={VibeVoice: High-Quality Text-to-Speech with Large Language Models},
author={Microsoft Research},
year={2024},
url={https://github.com/microsoft/VibeVoice}
}
MIT License.
If this model helped you, leave a ⭐ on GitHub!
Created by Fabio Sarracino
The first 8-bit VibeVoice model that actually works