by tonera
Open source · 459 downloads · 0 likes
DreamShaperXL v21 Turbo DPMSDE is an AI model specialized in generating images from text, optimized for fast and efficient execution. It employs advanced quantization techniques (such as SVDQuant) to reduce memory usage and accelerate inference while maintaining high visual quality. This model is particularly well-suited for environments with limited GPU resources, thanks to its integration with the Nunchaku tool, which utilizes 4-bit quantized weights (FP4/INT4). Its primary use cases include artistic creation, generating visuals for creative or professional projects, and experimenting with diverse styles. What sets it apart is its balance between performance and quality, enabling reduced generation times without sacrificing the richness of details in the produced images.
Language: English | 中文
tonera/dreamshaperXL_v21TurboDPMSDEtonera/dreamshaperXL_v21TurboDPMSDE (repo root)tonera/dreamshaperXL_v21TurboDPMSDE/svdq-<precision>_r32-dreamshaperXL_v21TurboDPMSDE.safetensorshttps://github.com/nunchaku-ai/nunchaku)Nunchaku is a high-performance inference engine for 4-bit (FP4/INT4) low-bit neural networks. Its goal is to significantly reduce VRAM usage and improve inference speed while preserving generation quality as much as possible. It implements and productionizes post-training quantization methods such as SVDQuant, and reduces the overhead introduced by low-rank branches via operator/kernel fusion and other optimizations.
The SDXL quantized weights in this repository (e.g. svdq-*_r32-*.safetensors) are intended to be used with Nunchaku for efficient inference on supported GPUs.
PSNR: mean=16.6145 p50=16.8903 p90=18.686 best=19.0489 worst=13.1796 (N=25)
SSIM: mean=0.683617 p50=0.697688 p90=0.769644 best=0.818764 worst=0.492368 (N=25)
LPIPS: mean=0.289557 p50=0.283484 p90=0.349915 best=0.170336 worst=0.414013 (N=25)
Below is the inference performance comparison (Diffusers vs Nunchaku-UNet).
bf16 / steps=30 / guidance_scale=5.01024x1024, 1024x768, 768x1024, 832x1216, 1216x832torch 2.9 / cuda 12.8 / nunchaku 1.1.0+torch2.9 / diffusers 0.37.0.dev0torch.compile, no explicit cudnn tuning flags| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | load | 3.505s | 3.432s | 1.02x | +2.1% |
| RTX 5090 | cold_infer | 2.944s | 2.447s | 1.20x | +16.9% |
| RTX 5090 | cold_e2e | 6.449s | 5.880s | 1.10x | +8.8% |
| RTX 3090 | load | 3.787s | 3.442s | 1.10x | +9.1% |
| RTX 3090 | cold_infer | 7.503s | 5.231s | 1.43x | +30.3% |
| RTX 3090 | cold_e2e | 11.290s | 8.673s | 1.30x | +23.2% |
| GPU | Metric | Diffusers | Nunchaku | Speedup | Gain |
|---|---|---|---|---|---|
| RTX 5090 | total (5 images) | 12.937s | 9.813s | 1.32x | +24.2% |
| RTX 5090 | avg (per image) | 2.587s | 1.963s | 1.32x | +24.2% |
| RTX 3090 | total (5 images) | 33.413s | 22.975s | 1.45x | +31.2% |
| RTX 3090 | avg (per image) | 6.683s | 4.595s | 1.45x | +31.2% |
Notes:
https://nunchaku.tech/docs/nunchaku/installation/installation.htmlPyTorch >= 2.5 (follow the wheel requirements)cp311 means Python 3.11):
https://github.com/nunchaku-ai/nunchaku/releases# Example (select the correct wheel URL for your torch/cuda/python versions)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl
CUDA >= 12.8, and prefer FP4 models for compatibility/performance (follow official docs).import torch
from diffusers import StableDiffusionXLPipeline
from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel
from nunchaku.utils import get_precision
MODEL = "dreamshaperXL_v21TurboDPMSDE" # Replace with the actual model name before publishing (e.g. zavychromaxl_v100)
REPO_ID = f"tonera/{MODEL}"
if __name__ == "__main__":
unet = NunchakuSDXLUNet2DConditionModel.from_pretrained(
f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors"
)
pipe = StableDiffusionXLPipeline.from_pretrained(
f"{REPO_ID}",
unet=unet,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to("cuda")
prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors"
image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0]
image.save("sdxl.png")