par cagliostrolab
Open source · 275k downloads · 403 likes
Animagine XL 4.0 est un modèle d'IA spécialisé dans la génération d'images d'anime à partir de descriptions textuelles, offrant une qualité visuelle et une précision des détails améliorées par rapport aux versions précédentes. Grâce à un entraînement approfondi sur des millions d'images et des optimisations ciblées, il produit des illustrations plus stables, avec des proportions anatomiques plus réalistes, des couleurs plus riches et moins d'artefacts. Idéal pour les artistes, les créateurs de contenu ou les passionnés, il excelle dans la création d'œuvres stylisées, que ce soit pour des illustrations, des concepts ou des visuels narratifs. Ce qui le distingue, c'est sa capacité à interpréter des prompts complexes tout en conservant une cohérence artistique propre à l'univers de l'anime.

Animagine XL 4.0, also stylized as Anim4gine, is the ultimate anime-themed finetuned SDXL model and the latest installment of Animagine XL series. Despite being a continuation, the model was retrained from Stable Diffusion XL 1.0 with a massive dataset of 8.4M diverse anime-style images from various sources with the knowledge cut-off of January 7th 2025 and finetuned for approximately 2650 GPU hours. Similar to the previous version, this model was trained using tag ordering method for the identity and style training. With the release of Animagine XL 4.0 Opt (Optimized), the model has been further refined with an additional dataset, improving stability, anatomy accuracy, noise reduction, color saturation, and overall color accuracy. These enhancements make Animagine XL 4.0 Opt more consistent and visually appealing while maintaining the signature quality of the series.
Hugging Face SpacesComfyUI or Stable Diffusion Webuidiffuserspip install diffusers transformers accelerate safetensors --upgrade
The example below uses lpw_stable_diffusion_xl pipeline which enables better handling of long, weighted and detailed prompts. The model is already uploaded in FP16 format, so there's no need to specify variant="fp16" in the from_pretrained call.
import torch
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"cagliostrolab/animagine-xl-4.0",
torch_dtype=torch.float16,
use_safetensors=True,
custom_pipeline="lpw_stable_diffusion_xl",
add_watermarker=False
)
pipe.to('cuda')
prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=832,
height=1216,
guidance_scale=5,
num_inference_steps=28
).images[0]
image.save("./arima_kana.png")
The summary can be seen in the image for the prompt guideline.

The model was trained with tag-based captions and the tag-ordering method. Use this structured template:
1girl/1boy/1other, character name, from which series, rating, everything else in any order and end with quality enhancement
Add these tags at the end of your prompt:
masterpiece, high score, great score, absurdres
lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry
| Orientation | Dimensions | Aspect Ratio |
|---|---|---|
| Square | 1024 x 1024 | 1:1 |
| Landscape | 1152 x 896 | 9:7 |
| 1216 x 832 | 3:2 | |
| 1344 x 768 | 7:4 | |
| 1536 x 640 | 12:5 | |
| Portrait | 896 x 1152 | 7:9 |
| 832 x 1216 | 2:3 | |
| 768 x 1344 | 4:7 | |
| 640 x 1536 | 5:12 |
1girl, firefly \(honkai: star rail\), honkai \(series\), honkai: star rail, safe, casual, solo, looking at viewer, outdoors, smile, reaching towards viewer, night, masterpiece, high score, great score, absurdres
The model supports various special tags that can be used to control different aspects of the image generation process. These tags are carefully weighted and tested to provide consistent results across different prompts.
Quality tags are fundamental controls that directly influence the overall image quality and detail level. Available quality tags:
masterpiecebest qualitylow qualityworst quality![]() | ![]() |
|---|---|
Sample image using "masterpiece, best quality" quality tags with negative prompt left empty. | Sample image using "low quality, worst quality" quality tags with negative prompt left empty. |
Score tags provide a more nuanced control over image quality compared to basic quality tags. They have a stronger impact on steering output quality in this model. Available score tags:
high scoregreat scoregood scoreaverage scorebad scorelow score![]() | ![]() |
|---|---|
Sample image using "high score, great score" score tags with negative prompt left empty. | Sample image using "bad score, low score" score tags with negative prompt left empty. |
Temporal tags allow you to influence the artistic style based on specific time periods or years. This can be useful for generating images with era-specific artistic characteristics. Supported year tags:
year 2005year {n}year 2025![]() | ![]() |
|---|---|
Sample image of Hatsune Miku with "year 2007" temporal tag. | Sample image of Hatsune Miku with "year 2023" temporal tag. |
Rating tags help control the content safety level of generated images. These tags should be used responsibly and in accordance with applicable laws and platform policies. Supported ratings:
safesensitivensfwexplicitThe model was trained using state-of-the-art hardware and optimized hyperparameters to ensure the highest quality output. Below are the detailed technical specifications and parameters used during the training process:
| Parameter | Value |
|---|---|
| Hardware | 7 x H100 80GB SXM5 |
| Num Images | 8,401,464 |
| UNet Learning Rate | 2.5e-6 |
| Text Encoder Learning Rate | 1.25e-6 |
| Scheduler | Constant With Warmup |
| Warmup Steps | 5% |
| Batch Size | 32 |
| Gradient Accumulation Steps | 2 |
| Training Resolution | 1024x1024 |
| Optimizer | Adafactor |
| Input Perturbation Noise | 0.1 |
| Debiased Estimation Loss | Enabled |
| Mixed Precision | fp16 |
This long-term project would not have been possible without the groundbreaking work, innovative contributions, and comprehensive documentation provided by Stability AI, Novel AI, and Waifu Diffusion Team. We are especially grateful for the kickstarter grant from Main that enabled us to progress beyond V2. For this iteration, we would like to express our sincere gratitude to everyone in the community for their continuous support, particularly:
We extend our heartfelt appreciation to our dedicated team members who have contributed significantly to this project, including but not limited to:
We're excited to introduce new fundraising methods through GitHub Sponsors to support training, research, and model development. Your support helps us push the boundaries of what's possible with AI.
You can help us with:
Donate: Contribute via ETH, USDT, or USDC to the address below, or sponsor us on GitHub.
Share: Spread the word about our models and share your creations!
Feedback: Let us know how we can improve.
Donation Address:
ETH/USDT/USDC(e): 0xd8A1dA94BA7E6feCe8CfEacc1327f498fCcBFC0C
Github Sponsor: https://github.com/sponsors/cagliostrolab/
Feel free to join our discord server
This model adopts the original CreativeML Open RAIL++-M License from Stability AI without any modifications or additional restrictions. The license terms remain exactly as specified in the original SDXL license, which includes:
Please refer to the original SDXL license for the complete and authoritative terms and conditions.