par diffusers
Open source · 16k downloads · 529 likes
Le modèle ControlNet Canny SDXL 1.0 est une version spécialisée de Stable Diffusion XL qui intègre le contrôle par contours (Canny) pour guider la génération d'images. Il permet de créer des visuels en respectant précisément les contours et les formes d'une image de référence, tout en conservant la qualité et la richesse des détails propres à SDXL. Ce modèle excelle pour des applications nécessitant un alignement parfait avec des structures existantes, comme la retouche photo, la création d'illustrations stylisées ou la génération d'images à partir de croquis. Ce qui le distingue, c'est sa capacité à produire des résultats cohérents et détaillés, même pour des scènes complexes ou des compositions artistiques exigeantes.
These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. You can find some example images in the following.
prompt: a couple watching a romantic sunset, 4k photo

prompt: ultrarealistic shot of a furry blue bird

prompt: a woman, close up, detailed, beautiful, street photography, photorealistic, detailed, Kodak ektar 100, natural, candid shot

prompt: Cinematic, neoclassical table in the living room, cinematic, contour, lighting, highly detailed, winter, golden hour

prompt: a tornado hitting grass field, 1980's film grain. overcast, muted colors.

Make sure to first install the libraries:
pip install accelerate transformers safetensors opencv-python diffusers
And then we're ready to go:
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers.utils import load_image
from PIL import Image
import torch
import numpy as np
import cv2
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = 'low quality, bad quality, sketches'
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
controlnet_conditioning_scale = 0.5 # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
vae=vae,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)
images = pipe(
prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
).images
images[0].save(f"hug_lab.png")

To more details, check out the official documentation of StableDiffusionXLControlNetPipeline.
Our training script was built on top of the official training script that we provide here.
This checkpoint was first trained for 20,000 steps on laion 6a resized to a max minimum dimension of 384. It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was necessary for image quality.
one 8xA100 machine
Data parallel with a single gpu batch size of 8 for a total batch size of 64.
Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
fp16