par stabilityai
Open source · 879k downloads · 2556 likes
SDXL-Turbo est un modèle de génération d'images par IA capable de créer des visuels photoréalistes à partir d'une simple description textuelle en un temps record, grâce à une évaluation unique du réseau. Développé par Stability AI, il se distingue par sa rapidité exceptionnelle, produisant des résultats de haute qualité en seulement 1 à 4 étapes de calcul, contre des dizaines habituellement nécessaires. Idéal pour des applications en temps réel, il s'adapte aussi bien à la création artistique qu'à des outils éducatifs ou commerciaux, tout en restant accessible pour la recherche. Ce modèle se démarque par sa méthode de distillation adversariale, qui préserve la fidélité des images malgré sa rapidité, offrant un équilibre unique entre performance et précision.
SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
A real-time demo is available here: http://clipdrop.co/stable-diffusion-turbo
Please note: For commercial use, please refer to https://stability.ai/license.
SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis. SDXL-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.
For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models),
which implements the most popular diffusion frameworks (both training and inference).
The charts above evaluate user preference for SDXL-Turbo over other single- and multi-step models.
SDXL-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-XL evaluated at four (or fewer) steps.
In addition, we see that using four steps for SDXL-Turbo further improves performance.
For details on the user study, we refer to the research paper.
The model is intended for both non-commercial and commercial usage. You can use this model for non-commercial or research purposes under this license. Possible research areas and tasks include
For commercial use, please refer to https://stability.ai/membership.
Excluded uses are described below.
pip install diffusers transformers accelerate --upgrade
SDXL-Turbo does not make use of guidance_scale or negative_prompt, we disable it with guidance_scale=0.0.
Preferably, the model generates images of size 512x512 but higher image sizes work as well.
A single step is enough to generate high quality images.
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
When using SDXL-Turbo for image-to-image generation, make sure that num_inference_steps * strength is larger or equal
to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example
below.
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.
The model is intended for both non-commercial and commercial usage.