by InstantX
Open source · 5k downloads · 117 likes
The SD3.5 Large IP Adapter is an extension designed for the SD3.5-Large model, enabling images to be integrated as text prompts in image generation. With its advanced image encoder, it delivers improved visual fidelity and more natural integration of visual references into creations. Ideal for artists and creators, it streamlines the production of images that align with specific styles or elements inspired by visual references. What sets it apart is its ability to process images almost like text while optimizing the quality of the results. It is particularly suited for users seeking to refine their generations with precise visual constraints.
This repository contains a IP-Adapter for SD3.5-Large model released by researchers from InstantX Team, where image work just like text, so it may not be responsive or interfere with other text, but we do hope you enjoy this model, have fun and share your creative works with us on Twitter.
This is a regular IP-Adapter, where the new layers are added into all 38 blocks. We use google/siglip-so400m-patch14-384 to encode image for its superior performance, and adopt a TimeResampler to project. The image token number is set to 64.
The code has not been integrated into diffusers yet, please use our local files at this moment.
import torch
from PIL import Image
from models.transformer_sd3 import SD3Transformer2DModel
from pipeline_stable_diffusion_3_ipa import StableDiffusion3Pipeline
model_path = 'stabilityai/stable-diffusion-3.5-large'
ip_adapter_path = './ip-adapter.bin'
image_encoder_path = "google/siglip-so400m-patch14-384"
transformer = SD3Transformer2DModel.from_pretrained(
model_path, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = StableDiffusion3Pipeline.from_pretrained(
model_path, transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
pipe.init_ipadapter(
ip_adapter_path=ip_adapter_path,
image_encoder_path=image_encoder_path,
nb_token=64,
)
ref_img = Image.open('./assets/1.jpg').convert('RGB')
# please note that SD3.5 Large is sensitive to highres generation like 1536x1536
image = pipe(
width=1024,
height=1024,
prompt='a cat',
negative_prompt="lowres, low quality, worst quality",
num_inference_steps=24,
guidance_scale=5.0,
generator=torch.Generator("cuda").manual_seed(42),
clip_image=ref_img,
ipadapter_scale=0.5,
).images[0]
image.save('./result.jpg')
Please refer to Slickytail/ComfyUI-InstantX-IPAdapter-SD3.
The model is released under stabilityai-ai-community. All copyright reserved.
This project is sponsored by HuggingFace and fal.ai. Thanks to Slickytail for supporting ComfyUI node.
If you find this project useful in your research, please cite us via
@misc{sd35-large-ipa,
author = {InstantX Team},
title = {InstantX SD3.5-Large IP-Adapter Page},
year = {2024},
}