by stabilityai
Open source · 71k downloads · 444 likes
Stable Diffusion 3 Medium is a cutting-edge generative AI model developed by Stability AI, capable of transforming text descriptions into realistic and highly detailed images. Powered by its innovative *Multimodal Diffusion Transformer* (MMDiT) architecture, it excels in understanding complex prompts, generating precise typography, and optimizing resource efficiency while delivering high-quality visual results. Tailored for artists, designers, and creators, it finds applications in graphic design, illustration, education, and generative AI research. What sets it apart is its ability to interpret nuanced instructions while maintaining superior stylistic coherence and visual richness. Primarily intended for non-commercial use or under a dedicated license, it caters to users seeking to explore text-to-image creation in a responsible and innovative manner.


Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
For more technical details, please refer to the Research paper.
Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or contact us for commercial licensing details.
For local or self-hosted use, we recommend ComfyUI for inference.
Stable Diffusion 3 Medium is available on our Stability API Platform.
Stable Diffusion 3 models and workflows are available on Stable Assistant and on Discord via Stable Artisan.
We used synthetic data and filtered publicly available data to train our models. The model was pre-trained on 1 billion images. The fine-tuning data includes 30M high-quality aesthetic images focused on specific visual content and style, as well as 3M preference data images.
Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = pipe(
"A cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
guidance_scale=7.0,
).images[0]
image
Refer to the documentation for more details on optimization and image-to-image support.
Intended uses include the following:
All uses of the model should be in accordance with our Acceptable Use Policy.
The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model.
As part of our safety-by-design and responsible AI deployment approach, we implement safety measures throughout the development of our models, from the time we begin pre-training a model to the ongoing development, fine-tuning, and deployment of each model. We have implemented a number of safety mitigations that are intended to reduce the risk of severe harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases.
For more about our approach to Safety, please visit our Safety page.
Our evaluation methods include structured evaluations and internal and external red-teaming testing for specific, severe harms such as child sexual abuse and exploitation, extreme violence, and gore, sexually explicit content, and non-consensual nudity. Testing was conducted primarily in English and may not cover all possible harms. As with any model, the model may, at times, produce inaccurate, biased or objectionable responses to user prompts.
Please report any issues with the model or contact us: