AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsdreamshaperXL v21TurboDPMSDE

dreamshaperXL v21TurboDPMSDE

by tonera

Open source · 459 downloads · 0 likes

0.0
(0 reviews)ImageAPI & Local
About

DreamShaperXL v21 Turbo DPMSDE is an AI model specialized in generating images from text, optimized for fast and efficient execution. It employs advanced quantization techniques (such as SVDQuant) to reduce memory usage and accelerate inference while maintaining high visual quality. This model is particularly well-suited for environments with limited GPU resources, thanks to its integration with the Nunchaku tool, which utilizes 4-bit quantized weights (FP4/INT4). Its primary use cases include artistic creation, generating visuals for creative or professional projects, and experimenting with diverse styles. What sets it apart is its balance between performance and quality, enabling reduced generation times without sacrificing the richness of details in the produced images.

Documentation

Model Card (SVDQuant)

Language: English | 中文

Model Name

  • Model repo: tonera/dreamshaperXL_v21TurboDPMSDE
  • Base (Diffusers weights path): tonera/dreamshaperXL_v21TurboDPMSDE (repo root)
  • Quantized UNet weights: tonera/dreamshaperXL_v21TurboDPMSDE/svdq-<precision>_r32-dreamshaperXL_v21TurboDPMSDE.safetensors

Quantization / Inference Tech

  • Inference engine: Nunchaku (https://github.com/nunchaku-ai/nunchaku)

Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.

The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.

Quantization Quality (fp8)

Text
PSNR: mean=16.6145 p50=16.8903 p90=18.686 best=19.0489 worst=13.1796 (N=25)
SSIM: mean=0.683617 p50=0.697688 p90=0.769644 best=0.818764 worst=0.492368 (N=25)
LPIPS: mean=0.289557 p50=0.283484 p90=0.349915 best=0.170336 worst=0.414013 (N=25)

Performance

Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).

  • Inference config: bf16 / steps=30 / guidance_scale=5.0
  • Resolutions (5 images each, batch=5): 1024x1024, 1024x768, 768x1024, 832x1216, 1216x832
  • Software versions: torch 2.9 / cuda 12.8 / nunchaku 1.1.0+torch2.9 / diffusers 0.37.0.dev0
  • Optimization switches: no torch.compile, no explicit cudnn tuning flags

Cold-start performance (end-to-end for the first image)

GPUMetricDiffusersNunchakuSpeedupGain
RTX 5090load3.505s3.432s1.02x+2.1%
RTX 5090cold_infer2.944s2.447s1.20x+16.9%
RTX 5090cold_e2e6.449s5.880s1.10x+8.8%
RTX 3090load3.787s3.442s1.10x+9.1%
RTX 3090cold_infer7.503s5.231s1.43x+30.3%
RTX 3090cold_e2e11.290s8.673s1.30x+23.2%

Steady-state performance (5 consecutive images after warmup)

GPUMetricDiffusersNunchakuSpeedupGain
RTX 5090total (5 images)12.937s9.813s1.32x+24.2%
RTX 5090avg (per image)2.587s1.963s1.32x+24.2%
RTX 3090total (5 images)33.413s22.975s1.45x+31.2%
RTX 3090avg (per image)6.683s4.595s1.45x+31.2%

Notes:

  • The longer load time on RTX 3090 is due to extra one-time processing when loading quantized weights.
  • During inference (cold_infer and steady-state), Nunchaku shows clear speedups on both GPUs.

Nunchaku Installation Required

  • Official installation docs (recommended source of truth): https://nunchaku.tech/docs/nunchaku/installation/installation.html

(Recommended) Install the official prebuilt wheel

  • Prerequisite: PyTorch >= 2.5 (follow the wheel requirements)
  • Install Nunchaku wheel: choose a wheel matching your torch/cuda/python versions from GitHub Releases / HuggingFace / ModelScope (note cp311 means Python 3.11):
    • https://github.com/nunchaku-ai/nunchaku/releases
Bash
# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
  • Tip (RTX 50 series): typically prefer CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).

Usage Example (Diffusers + Nunchaku UNet)

Python
import torch
from diffusers import StableDiffusionXLPipeline

from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision

MODEL = "dreamshaperXL_v21TurboDPMSDE"  # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"

if __name__ == "__main__":
    unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
        f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
    )

    pipe = StableDiffusionXLPipeline.from_pretrained(
        f"{REPO_ID}",
        unet=unet,
        torch_dtype=torch.bfloat16,
        use_safetensors=True,
    ).to("cuda")

    prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
    image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
    image.save("sdxl.png")
Capabilities & Tags
diffuserssafetensorssdxlquantizationsvdquantnunchakufp4int4text-to-imageendpoints_compatible
Links & Resources
Specifications
CategoryImage
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
0.0

Try dreamshaperXL v21TurboDPMSDE

Access the model directly