by cagliostrolab
Open source · 179k downloads · 714 likes
Animagine XL 3.1 is an AI model specialized in generating stylized images, primarily focused on the world of Japanese animation. It excels at creating detailed and coherent visuals inspired by the artistic styles of manga and anime, offering a wide range of themes and characters. Its capabilities include producing original illustrations, concept art, or even complex scenes, while providing flexibility to adapt styles and moods. Ideal for artists, content creators, or enthusiasts, it stands out for its precision in details and its ability to adhere to the aesthetic codes of animated works. This model serves as a powerful tool for bringing visual ideas to life with an artistic and professional touch.
|
|
|
Animagine XL 3.1 is an update in the Animagine XL V3 series, enhancing the previous version, Animagine XL 3.0. This open-source, anime-themed text-to-image model has been improved for generating anime-style images with higher quality. It includes a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation. Built on Stable Diffusion XL, Animagine XL 3.1 aims to be a valuable resource for anime fans, artists, and content creators by producing accurate and detailed representations of anime characters.
Try the demo powered by Gradio in Huggingface Spaces:
Or open the demo in Google Colab:
First install the required libraries:
pip install diffusers transformers accelerate safetensors --upgrade
Then run image generation with the following example code:
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"cagliostrolab/animagine-xl-3.1",
torch_dtype=torch.float16,
use_safetensors=True,
)
pipe.to('cuda')
prompt = "1girl, souryuu asuka langley, neon genesis evangelion, solo, upper body, v, smile, looking at viewer, outdoors, night"
negative_prompt = "nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=832,
height=1216,
guidance_scale=7,
num_inference_steps=28
).images[0]
image.save("./output/asuka_test.png")
For optimal results, it's recommended to follow the structured prompt template because we train the model like this:
1girl/1boy, character name, from what series, everything else in any order.
Animagine XL 3.1 utilizes special tags to steer the result toward quality, rating, creation date and aesthetic. While the model can generate images without these tags, using them can help achieve better results.
Quality tags now consider both scores and post ratings to ensure a balanced quality distribution. We've refined labels for greater clarity, such as changing 'high quality' to 'great quality'.
| Quality Modifier | Score Criterion |
|---|---|
masterpiece | > 95% |
best quality | > 85% & ≤ 95% |
great quality | > 75% & ≤ 85% |
good quality | > 50% & ≤ 75% |
normal quality | > 25% & ≤ 50% |
low quality | > 10% & ≤ 25% |
worst quality | ≤ 10% |
We've also streamlined our rating tags for simplicity and clarity, aiming to establish global rules that can be applied across different models. For example, the tag 'rating: general' is now simply 'general', and 'rating: sensitive' has been condensed to 'sensitive'.
| Rating Modifier | Rating Criterion |
|---|---|
safe | General |
sensitive | Sensitive |
nsfw | Questionable |
explicit, nsfw | Explicit |
We've also redefined the year range to steer results towards specific modern or vintage anime art styles more accurately. This update simplifies the range, focusing on relevance to current and past eras.
| Year Tag | Year Range |
|---|---|
newest | 2021 to 2024 |
recent | 2018 to 2020 |
mid | 2015 to 2017 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
We've enhanced our tagging system with aesthetic tags to refine content categorization based on visual appeal. These tags are derived from evaluations made by a specialized ViT (Vision Transformer) image classification model, specifically trained on anime data. For this purpose, we utilized the model shadowlilac/aesthetic-shadow-v2, which assesses the aesthetic value of content before it undergoes training. This ensures that each piece of content is not only relevant and accurate but also visually appealing.
| Aesthetic Tag | Score Range |
|---|---|
very aesthetic | > 0.71 |
aesthetic | > 0.45 & < 0.71 |
displeasing | > 0.27 & < 0.45 |
very displeasing | ≤ 0.27 |
To guide the model towards generating high-aesthetic images, use negative prompts like:
nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
For higher quality outcomes, prepend prompts with:
masterpiece, best quality, very aesthetic, absurdres
it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
This model supports generating images at the following dimensions:
| Dimensions | Aspect Ratio |
|---|---|
1024 x 1024 | 1:1 Square |
1152 x 896 | 9:7 |
896 x 1152 | 7:9 |
1216 x 832 | 19:13 |
832 x 1216 | 13:19 |
1344 x 768 | 7:4 Horizontal |
768 x 1344 | 4:7 Vertical |
1536 x 640 | 12:5 Horizontal |
640 x 1536 | 5:12 Vertical |
Animagine XL 3.1 was trained on 2x A100 80GB GPUs for approximately 15 days, totaling over 350 GPU hours. The training process consisted of three stages:
| Stage | Epochs | UNet lr | Train Text Encoder | Batch Size | Noise Offset | Optimizer | LR Scheduler | Grad Acc Steps | GPUs |
|---|---|---|---|---|---|---|---|---|---|
| Pretraining | 10 | 1e-5 | True | 16 | N/A | AdamW | Cosine Annealing Warm Restart | 3 | 2 |
| Finetuning 1st Stage | 10 | 2e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
| Finetuning 2nd Stage | 15 | 1e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
| Configuration Item | Animagine XL 3.0 | Animagine XL 3.1 |
|---|---|---|
| GPU | 2 x A100 80G | 2 x A100 80G |
| Dataset | 1,271,990 | 873,504 |
| Shuffle Separator | True | True |
| Num Epochs | 10 | 10 |
| Learning Rate | 7.5e-6 | 1e-5 |
| Text Encoder Learning Rate | 3.75e-6 | 1e-5 |
| Effective Batch Size | 48 x 1 x 2 | 16 x 3 x 2 |
| Optimizer | Adafactor | AdamW |
| Optimizer Args | Scale Parameter: False, Relative Step: False, Warmup Init: False | Weight Decay: 0.1, Betas: (0.9, 0.99) |
| LR Scheduler | Constant with Warmup | Cosine Annealing Warm Restart |
| LR Scheduler Args | Warmup Steps: 100 | Num Cycles: 10, Min LR: 1e-6, LR Decay: 0.9, First Cycle Steps: 9,099 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
The development and release of Animagine XL 3.1 would not have been possible without the invaluable contributions and support from the following individuals and organizations:
Thank you all for your support and expertise in pushing the boundaries of anime-style image generation.
While Animagine XL 3.1 represents a significant advancement in anime-style image generation, it is important to acknowledge its limitations:
By acknowledging these limitations, we aim to provide transparency and set realistic expectations for users of Animagine XL 3.1. Despite these constraints, we believe that the model represents a significant step forward in anime-style image generation and offers a powerful tool for artists, designers, and enthusiasts alike.
This model is licensed under the CreativeML Open RAIL++-M License.
To ensure full compatibility with the upstream SDXL ecosystem and standard usage rights, this model adheres strictly to the original SDXL terms, which include:
Note: This license supersedes any previous community license tags (e.g., FAIPL) applied to earlier versions of this repository, ensuring full compatibility with the standard SDXL ecosystem.
Please refer to the full license agreement for complete details.