by stabilityai
Open source · 668k downloads · 447 likes
SD-Turbo is a text-to-image generation model designed to produce photorealistic images in a single pass through the network, enabling ultra-fast synthesis. Developed by Stability AI, it leverages an adversarial distillation method to drastically reduce the number of steps required for generation while maintaining high quality. Ideal for real-time applications, it is particularly well-suited for creative, artistic, or educational uses, as well as research on generative models. The model stands out for its speed and efficiency, though for even more refined results, the SDXL-Turbo version is recommended. Its commercial use is governed by a specific license.
SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding,
we recommend SDXL-Turbo.
Please note: For commercial use, please refer to https://stability.ai/license.
SD-Turbo is a distilled version of Stable Diffusion 2.1, trained for real-time synthesis. SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.
For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models),
which implements the most popular diffusion frameworks (both training and inference).
The charts above evaluate user preference for SD-Turbo over other single- and multi-step models.
SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5.
Note: For increased quality, we recommend the bigger version SDXL-Turbo. For details on the user study, we refer to the research paper.
The model is intended for both non-commercial and commercial usage. Possible research areas and tasks include
For commercial use, please refer to https://stability.ai/membership.
Excluded uses are described below.
pip install diffusers transformers accelerate --upgrade
SD-Turbo does not make use of guidance_scale or negative_prompt, we disable it with guidance_scale=0.0.
Preferably, the model generates images of size 512x512 but higher image sizes work as well.
A single step is enough to generate high quality images.
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
When using SD-Turbo for image-to-image generation, make sure that num_inference_steps * strength is larger or equal
to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example
below.
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.
The model is intended for both non-commercial and commercial usage.