by playgroundai
Open source · 325k downloads · 763 likes
Playground v2.5 is an AI-powered image generation model capable of producing aesthetically pleasing visuals in high resolution (1024x1024 pixels) or in portrait and landscape formats. It stands out for its superior visual quality, outperforming competing models like SDXL, DALL-E 3, or Midjourney 5.2 according to user studies. The model excels particularly in generating realistic or artistic images from text descriptions, with a strong focus on aligning with human preferences, especially in representations of people. Its strengths lie in its ability to produce fine details and harmonious compositions while offering great flexibility for various creative use cases.
This repository contains a model that generates highly aesthetic images of resolution 1024x1024, as well as portrait and landscape aspect ratios. You can use the model with Hugging Face 🧨 Diffusers.

Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2.
Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.
For details on the development and training of our model, please refer to our blog post and technical report.
Install diffusers >= 0.27.0 and the relevant dependencies.
pip install diffusers>=0.27.0
pip install transformers accelerate safetensors
Notes:
EDMDPMSolverMultistepScheduler scheduler by default, for crisper fine details. It's an EDM formulation of the DPM++ 2M Karras scheduler. guidance_scale=3.0 is a good default for this scheduler.EDMEulerScheduler scheduler. It's an EDM formulation of the Euler scheduler. guidance_scale=5.0 is a good default for this scheduler.Then, run the following snippet:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"playgroundai/playground-v2.5-1024px-aesthetic",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
# # Optional: Use DPM++ 2M Karras scheduler for crisper fine details
# from diffusers import EDMDPMSolverMultistepScheduler
# pipe.scheduler = EDMDPMSolverMultistepScheduler()
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
Support coming soon. We will update this model card with instructions when ready.
This model card only provides a brief summary of our user study results. For extensive details on how we perform user studies, please check out our technical report.
We conducted studies to measure overall aesthetic quality, as well as for the specific areas we aimed to improve with Playground v2.5, namely multi aspect ratios and human preference alignment.

The aesthetic quality of Playground v2.5 dramatically outperforms the current state-of-the-art open source models SDXL and PIXART-α, as well as Playground v2. Because the performance differential between Playground V2.5 and SDXL was so large, we also tested our aesthetic quality against world-class closed-source models like DALL-E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them as well.

Similarly, for multi aspect ratios, we outperform SDXL by a large margin.

Next, we benchmark Playground v2.5 specifically on people-related images, to test Human Preference Alignment. We compared Playground v2.5 against two commonly-used baseline models: SDXL and RealStock v2, a community fine-tune of SDXL that was trained on a realistic people dataset.
Playground v2.5 outperforms both baselines by a large margin.

| Model | Overall FID |
|---|---|
| SDXL-1-0-refiner | 9.55 |
| playground-v2-1024px-aesthetic | 7.07 |
| playground-v2.5-1024px-aesthetic | 4.48 |
Lastly, we report metrics using our MJHQ-30K benchmark which we open-sourced with the v2 release. We report both the overall FID and per category FID. All FID metrics are computed at resolution 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preferences and the FID score of the MJHQ-30K benchmark.
@misc{li2024playground,
title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation},
author={Daiqing Li and Aleks Kamko and Ehsan Akhgari and Ali Sabet and Linmiao Xu and Suhail Doshi},
year={2024},
eprint={2402.17245},
archivePrefix={arXiv},
primaryClass={cs.CV}
}