by GuangyuanSD
Open source · 27k downloads · 31 likes
FLUX.2 klein 9B Blitz ComfyUI is a specialized model for face swapping, optimized for ultra-realistic and natural results. It incorporates BFS (Best Face Swap) technology, which eliminates the rigid artifacts of older methods while faithfully preserving facial identity, expressions, and lighting. Designed for exceptional inference speed, it enables generation in just 4 to 5 steps with a fixed CFG, even on consumer-grade hardware. Ideal for applications requiring precise and seamless face replacements—such as photo editing, special effects, or content creation—it stands out for delivering flawless integrations without compromising quality. Its approach combines the power of the accelerated FLUX.2 klein 9B with targeted optimizations, ensuring both fast and realistic performance.
This is the next-level face-swap specialized evolution of the Dark Beast lineage, built on the lightning-fast FLUX.2 Klein 9B accelerated model from Black Forest Labs.
Engineered with targeted optimizations for face swapping practices, it integrates BFS (Best Face Swap) technology to completely eliminate the rigid, unnatural look that plagued earlier face replacements — delivering seamless, lifelike integrations with preserved identity, expression, and lighting.
It also fully fixes the portrait reference issue from the previous DB BlitZ versions, ensuring right reference adherence every time.
Special thanks to the scheme provider: https://github.com/alisson-anjos for the powerful BFS foundation that powers this breakthrough.🟦

Important notes:
This version is exclusively designed around the Klein 9B accelerated edition — no base model exists.
Usage is identical to Black Forest Labs' official FLUX.2 Klein 9B accelerated release: ultra-low steps (e.g., 4-5), CFG=1 fixed, blazing inference speed on consumer hardware.
In one sentence: Dark Beast's ferocious soul meets BFS (Best Face Swap) technology — more natural, and truly unstoppable! 🟦
for more infomation about BFS (Best Face Swap) :
https://huggingface.co/Alissonerdx
Alternatively, it can be directly applied to the entire Klein 9b/Qwen Edit base and Fine-tune models, through LoRA Adapter parameter injection.

DarkBeast5steps_extracted_lora_r256 uploaded
working fine with FLUX.2 Klein 9b models
Fine-tunning of black-forest-labs/FLUX.2-klein-9B with BF16\FP8e4m3fn\NVFP4 quantization.
And Merge with @alcaitiff klein-9b-unchained-xxx
This is the ultimate speed-optimized Dark Beast V1 evolution, based on Flux.2 Klein 9B,
engineered specifically for lightning-fast low-step + CFG=1 workflows (5steps).
Also available in NVFP4 quantized format, optimized for acceleration on Blackwell architecture GPUs.
( like RTX50XX, PRO6000, B200, and others )
Also supports non-50 series GPUs (automatic 16-bit operation), Verify environment is my ComfyUI 0.11
Fully preserves the signature Dark Beast style, rich details, and intense Black Beast aesthetic from the standard lineage
Refined through advanced targeted distillation & fine-tuning, now perfectly dialed in for zero-CFG guidance at minimal steps
BlitZ-level inference speed — breathtaking high-quality images in just 5 steps ⚡
Recommended settings: 5 steps, CFG=1 (fixed), any seed you want
In one sentence: Taking Klein’s already blazing speed and cranking it to absolute BlitZ velocity while keeping every drop of that ferocious Dark Beast soul! 🟦
Lightning-fast generation awaits — unleash it now! 🚀
Usage:
pip install sdnq
import torch
import diffusers
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model
pipe = diffusers.Flux2KleinPipeline.from_pretrained("GuangyuanSD/FLUX.2-klein-9B-Blitz-Diffusers", torch_dtype=torch.bfloat16)
# Enable INT8 MatMul for AMD, Intel ARC and Nvidia GPUs:
if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)
# pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt=prompt,
height=1024,
width=1024,
guidance_scale=1.0,
num_inference_steps=4,
generator=torch.manual_seed(0)
).images[0]
image.save("flux-klein-Blitz.png")
Original BF16 vs Blitz fine-tune comparison:
| Quantization | Model Size | Visualization |
|---|---|---|
| Original BF16 | 18.2 GB | ![]() |
| Blitz fine-tune | 18.2 GB | ![]() |
Big thanks to @alcaitiff for the awesome work and killer contributions to training Z-Image and Klein models! Seriously impressive stuff! 🚀
非常感谢 @alcaitiff 对 Zimage 和 Klein 9b 的模型训练做出的杰出贡献!