par cagliostrolab
Open source · 121k downloads · 776 likes
Animagine XL 3.0 est un modèle open-source avancé de génération d'images anime à partir de texte, basé sur Stable Diffusion XL. Il excelle dans la création d'images détaillées et stylisées, avec des améliorations notables sur l'anatomie des mains, la compréhension des concepts et l'interprétation des prompts. Conçu pour apprendre des concepts plutôt que de se limiter à des critères esthétiques, il offre une meilleure fidélité aux styles et aux éléments spécifiques de l'univers anime. Idéal pour les artistes, les créateurs de contenu ou les passionnés, il permet de générer des illustrations de haute qualité, adaptées aussi bien aux styles modernes que rétro. Ce qui le distingue, c'est sa capacité à produire des résultats cohérents et nuancés, tout en restant accessible via des interfaces conviviales comme Gradio ou Google Colab.

Animagine XL 3.0 is the latest version of the sophisticated open-source anime text-to-image model, building upon the capabilities of its predecessor, Animagine XL 2.0. Developed based on Stable Diffusion XL, this iteration boasts superior image generation with notable improvements in hand anatomy, efficient tag ordering, and enhanced knowledge about anime concepts. Unlike the previous iteration, we focused on training the model to learn concepts rather than just aesthetics.
Animagine XL 3.0 is accessible through user-friendly platforms such as Gradio and Google Colab:
To use Animagine XL 3.0, install the required libraries as follows:
pip install diffusers --upgrade
pip install transformers accelerate safetensors
Example script for generating images with Animagine XL 3.0:
import torch
from diffusers import (
StableDiffusionXLPipeline,
EulerAncestralDiscreteScheduler,
AutoencoderKL
)
# Load VAE component
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16
)
# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"Linaqruf/animagine-xl-3.0",
vae=vae,
torch_dtype=torch.float16,
use_safetensors=True,
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
# Define prompts and generate image
prompt = "1girl, arima kana, oshi no ko, solo, upper body, v, smile, looking at viewer, outdoors, night"
negative_prompt = "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=832,
height=1216,
guidance_scale=7,
num_inference_steps=28
).images[0]
Prompting is a bit different in this iteration, for optimal results, it's recommended to follow the structured prompt template because we train the model like this:
1girl/1boy, character name, from what series, everything else in any order.
Like the previous iteration, this model was trained with some special tags to steer the result toward quality, rating and when the content was created. The model can still do the job without these special tags, but it’s recommended to use them if we want to make the model easier to handle.
| Quality Modifier | Score Criterion |
|---|---|
masterpiece | >150 |
best quality | 100-150 |
high quality | 75-100 |
medium quality | 25-75 |
normal quality | 0-25 |
low quality | -5-0 |
worst quality | <-5 |
| Rating Modifier | Rating Criterion |
|---|---|
rating: general | General |
rating: sensitive | Sensitive |
rating: questionable, nsfw | Questionable |
rating: explicit, nsfw | Explicit |
These tags help to steer the result toward modern or vintage anime art styles, ranging from newest to oldest.
| Year Tag | Year Range |
|---|---|
newest | 2022 to 2023 |
late | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
To guide the model towards generating high-aesthetic images, use negative prompts like:
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name
For higher quality outcomes, prepend prompts with:
masterpiece, best quality
However, exercise caution when using masterpiece and best quality, as these tags are frequently associated with NSFW datasets. It’s better to add nsfw, rating: sensitive to the negative prompt and rating: general to the positive prompt. it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
This model supports generating images at the following dimensions:
| Dimensions | Aspect Ratio |
|---|---|
1024 x 1024 | 1:1 Square |
1152 x 896 | 9:7 |
896 x 1152 | 7:9 |
1216 x 832 | 19:13 |
832 x 1216 | 13:19 |
1344 x 768 | 7:4 Horizontal |
768 x 1344 | 4:7 Vertical |
1536 x 640 | 12:5 Horizontal |
640 x 1536 | 5:12 Vertical |
| Stage | Epochs | UNet Learning Rate | Train Text Encoder | Text Encoder Learning Rate | Batch Size | Mixed Precision | Noise Offset |
|---|---|---|---|---|---|---|---|
| Feature Alignment Stage | 10 | 7.5e-6 | True | 3.75e-6 | 48 x 2 | fp16 | N/A |
| Refining UNet Stage | 10 | 2e-6 | False | N/A | 48 | fp16 | 0.0357 |
| Aesthetic Tuning Stage | 10 | 1e-6 | False | N/A | 48 | fp16 | 0.0357 |
| Configuration Item | Animagine XL 2.0 | Animagine 3.0 |
|---|---|---|
| GPU | A100 80G | 2 x A100 80G |
| Dataset | 170k + 83k images | 1271990 + 3500 Images |
| Shuffle Separator | N/A | True |
| Global Epochs | 20 | 20 |
| Learning Rate | 1e-6 | 7.5e-6 |
| Batch Size | 32 | 48 x 2 |
| Train Text Encoder | True | True |
| Train Special Tags | True | True |
| Image Resolution | 1024 | 1024 |
| Bucket Resolution | 2048 x 512 | 2048 x 512 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
While "Animagine XL 3.0" represents a significant advancement in anime text-to-image generation, it's important to acknowledge its limitations to understand its best use cases and potential areas for future improvement.
These limitations highlight areas for potential refinement in future iterations and underscore the importance of careful prompt crafting for optimal results. Understanding these constraints can help users better navigate the model's capabilities and tailor their expectations accordingly.
We extend our gratitude to the entire team and community that contributed to the development of Animagine XL 3.0, including our partners and collaborators who provided resources and insights crucial for this iteration.
keep_tokens_separator or Shuffle Separator.This model is licensed under the CreativeML Open RAIL++-M License.
To ensure full compatibility with the upstream SDXL ecosystem and standard usage rights, this model adheres strictly to the original SDXL terms, which include:
Note: This license supersedes any previous community license tags (e.g., FAIPL) applied to earlier versions of this repository, ensuring full compatibility with the standard SDXL ecosystem.
Please refer to the full license agreement for complete details.