AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsVibeVoice Large Q8

VibeVoice Large Q8

by FabioSarracino

Open source · 971 downloads · 95 likes

2.5
(95 reviews)AudioAPI & Local
About

VibeVoice Large Q8 is an AI model optimized for voice generation that stands out for its ability to operate in 8-bit while maintaining flawless audio quality, unlike other quantized models that often introduce noise. Through a selective quantization technique, it reduces the model size by 38% (11.6 GB instead of 18.7 GB) while using less RAM (12 GB instead of 20 GB), making it accessible to GPUs like the RTX 3060 or 4070 Ti. It excels in applications requiring an optimal balance between performance and quality, such as professional audio production or resource-limited environments, while remaining compatible with tools like ComfyUI. This model positions itself as a reliable solution for users seeking to harness the power of voice models without compromising the clarity of the final output.

Documentation

VibeVoice-Large-Q8 - Selective 8bit Quantization

The first 8-bit VibeVoice model that actually works

License Model Size Quality

🤗 Model • 💻 ComfyUI • 📖 Docs


🎯 Why This Model is Different

If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works.

The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.

Results

  • ✅ Perfect audio, identical to the original model
  • ✅ 11.6 GB instead of 18.7 GB (-38%)
  • ✅ Uses ~12 GB VRAM instead of 20 GB
  • ✅ Works on 12 GB GPUs (RTX 3060, 4070 Ti, etc.)

🚨 The Problem with Other 8-bit Models

Most 8-bit models you'll find online quantize everything aggressively: Result: Audio components get quantized → numerical errors propagate → audio = pure noise.


✅ The Solution: Selective Quantization

I only quantized what can be safely quantized without losing quality.

Result: 52% of parameters quantized, 48% at full precision = perfect audio quality.


📊 Quick Comparison

ModelSizeAudio QualityStatus
Original VibeVoice18.7 GB⭐⭐⭐⭐⭐Full precision
Other 8-bit models10.6 GB💥 NOISE❌ Don't work
This model11.6 GB⭐⭐⭐⭐⭐✅ Perfect

+1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it.


💻 How to Use It

With Transformers

Python
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
import scipy.io.wavfile as wavfile

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "FabioSarracino/VibeVoice-Large-Q8",
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

processor = AutoProcessor.from_pretrained(
    "FabioSarracino/VibeVoice-Large-Q8",
    trust_remote_code=True
)

# Generate audio
text = "Hello, this is VibeVoice speaking."
inputs = processor(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=None)

# Save
audio = output.speech_outputs[0].cpu().numpy()
wavfile.write("output.wav", 24000, audio)

With ComfyUI (recommended)

  1. Install the custom node:

    Bash
    cd ComfyUI/custom_nodes
    git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
    
  2. Download this model to ComfyUI/models/vibevoice/

  3. Restart ComfyUI and use it normally!


💾 System Requirements

Minimum

  • VRAM: 12 GB
  • RAM: 16 GB
  • GPU: NVIDIA with CUDA (required)
  • Storage: 11 GB

Recommended

  • VRAM: 16+ GB
  • RAM: 32 GB
  • GPU: RTX 3090/4090, A5000 or better

⚠️ Not supported: CPU, Apple Silicon (MPS), AMD GPUs


⚠️ Limitations

  1. Requires NVIDIA GPU with CUDA - won't work on CPU or Apple Silicon
  2. Inference only - don't use for fine-tuning
  3. Requires:
    • transformers>=4.51.3
    • bitsandbytes>=0.43.0

🆚 When to Use This Model

✅ Use this 8-bit if:

  • You have 12-16 GB VRAM
  • You want maximum quality with reduced size
  • You need a production-ready model
  • You want the best size/quality balance

Use full precision (18.7 GB) if:

  • You have unlimited VRAM (24+ GB)
  • You're doing research requiring absolute precision

Use 4-bit NF4 (~6.6 GB) if:

  • You only have 8-10 GB VRAM
  • You can accept a small quality trade-off

🔧 Troubleshooting

"OutOfMemoryError" during loading

  • Close other GPU applications
  • Use device_map="auto"
  • Reduce batch size to 1

"BitsAndBytes not found"

Bash
pip install bitsandbytes>=0.43.0

Audio sounds distorted

This shouldn't happen! If it does:

  1. Verify you downloaded the correct model
  2. Update transformers: pip install --upgrade transformers
  3. Check CUDA: torch.cuda.is_available() should return True

📚 Citation

Bibtex
@misc{vibevoice-q8-2025,
  title={VibeVoice-Large-Q8: Selective 8-bit Quantization for Audio Quality},
  author={Fabio Sarracino},
  year={2025},
  url={https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8}
}

Original Model

Bibtex
@misc{vibevoice2024,
  title={VibeVoice: High-Quality Text-to-Speech with Large Language Models},
  author={Microsoft Research},
  year={2024},
  url={https://github.com/microsoft/VibeVoice}
}

🔗 Related Resources

  • Original Model - Full precision base
  • ComfyUI Node - ComfyUI integration

📜 License

MIT License.


🤝 Support

  • Issues: GitHub Issues
  • Questions: HuggingFace Discussions

If this model helped you, leave a ⭐ on GitHub!


Created by Fabio Sarracino

The first 8-bit VibeVoice model that actually works

🤗 HuggingFace • 💻 GitHub

Capabilities & Tags
transformerssafetensorsvibevoicetext-to-speechaudiottsvoicequantized8bitbitsandbytes
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
2.5

Try VibeVoice Large Q8

Access the model directly