by cagliostrolab
Open source · 117k downloads · 776 likes
Animagine XL 3.0 is an advanced open-source anime image generation model based on Stable Diffusion XL. It excels at creating detailed and stylized images, with significant improvements in hand anatomy, concept comprehension, and prompt interpretation. Designed to learn concepts rather than adhere strictly to aesthetic criteria, it delivers greater fidelity to anime styles and specific universe elements. Perfect for artists, content creators, or enthusiasts, it generates high-quality illustrations suited to both modern and retro styles. What sets it apart is its ability to produce consistent and nuanced results while remaining accessible through user-friendly interfaces like Gradio or Google Colab.

Animagine XL 3.0 is the latest version of the sophisticated open-source anime text-to-image model, building upon the capabilities of its predecessor, Animagine XL 2.0. Developed based on Stable Diffusion XL, this iteration boasts superior image generation with notable improvements in hand anatomy, efficient tag ordering, and enhanced knowledge about anime concepts. Unlike the previous iteration, we focused on training the model to learn concepts rather than just aesthetics.
Animagine XL 3.0 is accessible through user-friendly platforms such as Gradio and Google Colab:
To use Animagine XL 3.0, install the required libraries as follows:
pip install diffusers --upgrade
pip install transformers accelerate safetensors
Example script for generating images with Animagine XL 3.0:
import torch
from diffusers import (
StableDiffusionXLPipeline,
EulerAncestralDiscreteScheduler,
AutoencoderKL
)
# Load VAE component
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16
)
# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"Linaqruf/animagine-xl-3.0",
vae=vae,
torch_dtype=torch.float16,
use_safetensors=True,
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
# Define prompts and generate image
prompt = "1girl, arima kana, oshi no ko, solo, upper body, v, smile, looking at viewer, outdoors, night"
negative_prompt = "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=832,
height=1216,
guidance_scale=7,
num_inference_steps=28
).images[0]
Prompting is a bit different in this iteration, for optimal results, it's recommended to follow the structured prompt template because we train the model like this:
1girl/1boy, character name, from what series, everything else in any order.
Like the previous iteration, this model was trained with some special tags to steer the result toward quality, rating and when the content was created. The model can still do the job without these special tags, but it’s recommended to use them if we want to make the model easier to handle.
| Quality Modifier | Score Criterion |
|---|---|
masterpiece | >150 |
best quality | 100-150 |
high quality | 75-100 |
medium quality | 25-75 |
normal quality | 0-25 |
low quality | -5-0 |
worst quality | <-5 |
| Rating Modifier | Rating Criterion |
|---|---|
rating: general | General |
rating: sensitive | Sensitive |
rating: questionable, nsfw | Questionable |
rating: explicit, nsfw | Explicit |
These tags help to steer the result toward modern or vintage anime art styles, ranging from newest to oldest.
| Year Tag | Year Range |
|---|---|
newest | 2022 to 2023 |
late | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
To guide the model towards generating high-aesthetic images, use negative prompts like:
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
For higher quality outcomes, prepend prompts with:
masterpiece, best quality
However, exercise caution when using masterpiece and best quality, as these tags are frequently associated with NSFW datasets. It’s better to add nsfw, rating: sensitive to the negative prompt and rating: general to the positive prompt. it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
This model supports generating images at the following dimensions:
| Dimensions | Aspect Ratio |
|---|---|
1024 x 1024 | 1:1 Square |
1152 x 896 | 9:7 |
896 x 1152 | 7:9 |
1216 x 832 | 19:13 |
832 x 1216 | 13:19 |
1344 x 768 | 7:4 Horizontal |
768 x 1344 | 4:7 Vertical |
1536 x 640 | 12:5 Horizontal |
640 x 1536 | 5:12 Vertical |
| Stage | Epochs | UNet Learning Rate | Train Text Encoder | Text Encoder Learning Rate | Batch Size | Mixed Precision | Noise Offset |
|---|---|---|---|---|---|---|---|
| Feature Alignment Stage | 10 | 7.5e-6 | True | 3.75e-6 | 48 x 2 | fp16 | N/A |
| Refining UNet Stage | 10 | 2e-6 | False | N/A | 48 | fp16 | 0.0357 |
| Aesthetic Tuning Stage | 10 | 1e-6 | False | N/A | 48 | fp16 | 0.0357 |
| Configuration Item | Animagine XL 2.0 | Animagine 3.0 |
|---|---|---|
| GPU | A100 80G | 2 x A100 80G |
| Dataset | 170k + 83k images | 1271990 + 3500 Images |
| Shuffle Separator | N/A | True |
| Global Epochs | 20 | 20 |
| Learning Rate | 1e-6 | 7.5e-6 |
| Batch Size | 32 | 48 x 2 |
| Train Text Encoder | True | True |
| Train Special Tags | True | True |
| Image Resolution | 1024 | 1024 |
| Bucket Resolution | 2048 x 512 | 2048 x 512 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
While "Animagine XL 3.0" represents a significant advancement in anime text-to-image generation, it's important to acknowledge its limitations to understand its best use cases and potential areas for future improvement.
These limitations highlight areas for potential refinement in future iterations and underscore the importance of careful prompt crafting for optimal results. Understanding these constraints can help users better navigate the model's capabilities and tailor their expectations accordingly.
We extend our gratitude to the entire team and community that contributed to the development of Animagine XL 3.0, including our partners and collaborators who provided resources and insights crucial for this iteration.
keep_tokens_separator or Shuffle Separator.This model is licensed under the CreativeML Open RAIL++-M License.
To ensure full compatibility with the upstream SDXL ecosystem and standard usage rights, this model adheres strictly to the original SDXL terms, which include:
Note: This license supersedes any previous community license tags (e.g., FAIPL) applied to earlier versions of this repository, ensuring full compatibility with the standard SDXL ecosystem.
Please refer to the full license agreement for complete details.