by FabioSarracino
Open source · 971 downloads · 95 likes
VibeVoice Large Q8 is an AI model optimized for voice generation that stands out for its ability to operate in 8-bit while maintaining flawless audio quality, unlike other quantized models that often introduce noise. Through a selective quantization technique, it reduces the model size by 38% (11.6 GB instead of 18.7 GB) while using less RAM (12 GB instead of 20 GB), making it accessible to GPUs like the RTX 3060 or 4070 Ti. It excels in applications requiring an optimal balance between performance and quality, such as professional audio production or resource-limited environments, while remaining compatible with tools like ComfyUI. This model positions itself as a reliable solution for users seeking to harness the power of voice models without compromising the clarity of the final output.
If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works.
The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.
Most 8-bit models you'll find online quantize everything aggressively: Result: Audio components get quantized → numerical errors propagate → audio = pure noise.
I only quantized what can be safely quantized without losing quality.
Result: 52% of parameters quantized, 48% at full precision = perfect audio quality.
| Model | Size | Audio Quality | Status |
|---|---|---|---|
| Original VibeVoice | 18.7 GB | ⭐⭐⭐⭐⭐ | Full precision |
| Other 8-bit models | 10.6 GB | 💥 NOISE | ❌ Don't work |
| This model | 11.6 GB | ⭐⭐⭐⭐⭐ | ✅ Perfect |
+1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it.
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
import scipy.io.wavfile as wavfile
# Load model
model = AutoModelForCausalLM.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
trust_remote_code=True
)
# Generate audio
text = "Hello, this is VibeVoice speaking."
inputs = processor(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=None)
# Save
audio = output.speech_outputs[0].cpu().numpy()
wavfile.write("output.wav", 24000, audio)
Install the custom node:
cd ComfyUI/custom_nodes
git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
Download this model to ComfyUI/models/vibevoice/
Restart ComfyUI and use it normally!
⚠️ Not supported: CPU, Apple Silicon (MPS), AMD GPUs
transformers>=4.51.3bitsandbytes>=0.43.0device_map="auto"pip install bitsandbytes>=0.43.0
This shouldn't happen! If it does:
pip install --upgrade transformerstorch.cuda.is_available() should return True@misc{vibevoice-q8-2025,
title={VibeVoice-Large-Q8: Selective 8-bit Quantization for Audio Quality},
author={Fabio Sarracino},
year={2025},
url={https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8}
}
@misc{vibevoice2024,
title={VibeVoice: High-Quality Text-to-Speech with Large Language Models},
author={Microsoft Research},
year={2024},
url={https://github.com/microsoft/VibeVoice}
}
MIT License.
If this model helped you, leave a ⭐ on GitHub!
Created by Fabio Sarracino
The first 8-bit VibeVoice model that actually works