by thibaud
Open source · 50k downloads · 400 likes
The *ControlNet OpenPose SDXL 1.0* model is a specialized version of SDXL that incorporates OpenPose (v2) pose control to guide image generation. It enables the creation of visuals where characters or objects precisely follow a given posture or composition while maintaining the quality and rich detail characteristic of SDXL. Its primary use cases include artistic creation, illustrating dynamic scenes, or generating visual content that requires exact positioning of elements. What sets it apart is its ability to combine the realism of human or animal poses with the creative flexibility of SDXL, providing a powerful tool for artists, designers, or content creators seeking fine control over the structure of their images.
These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with OpenPose (v2) conditioning. You can find some example images in the following.
prompt: a ballerina, romantic sunset, 4k photo


(Image is from ComfyUI, you can drag and drop in Comfy to use it as workflow)
License: refers to the OpenPose's one.
First, install all the libraries:
pip install -q controlnet_aux transformers accelerate
pip install -q git+https://github.com/huggingface/diffusers
Now, we're ready to make Darth Vader dance:
from diffusers import AutoencoderKL, StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from controlnet_aux import OpenposeDetector
from diffusers.utils import load_image
# Compute openpose conditioning image.
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
)
openpose_image = openpose(image)
# Initialize ControlNet pipeline.
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
# Infer.
prompt = "Darth vader dancing in a desert, high quality"
negative_prompt = "low quality, bad quality"
images = pipe(
prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
num_images_per_prompt=4,
image=openpose_image.resize((1024, 1024)),
generator=torch.manual_seed(97),
).images
images[0]
Here are some gemerated examples:

Use of the training script by HF🤗 here.
This checkpoint was first trained for 15,000 steps on laion 6a resized to a max minimum dimension of 768.
one 1xA100 machine (Thanks a lot HF🤗 to provide the compute!)
Data parallel with a single gpu batch size of 2 with gradient accumulation 8.
Constant learning rate of 8e-5
fp16