by longlian
Open source · 17k downloads · 2 likes
LMD+ is an AI model specialized in generating images from text, significantly enhancing the ability of existing models to interpret and follow complex prompts. It uses an upstream large language model (LLM) to analyze instructions and plan the arrangement of elements in the image, thereby improving spatial accuracy, understanding of negations, attribute assignment, and object counting. Unlike traditional approaches, it addresses these challenges in a unified manner without requiring specific training for each case. LMD+ is built on Stable Diffusion v1.4 and incorporates additional adapters for finer control while remaining compatible with existing tools. Ideal for creative applications requiring high fidelity to textual descriptions, it stands out for its flexibility and innovative approach to delivering more consistent and nuanced results.
Paper | Project Page | 5-minute Blog Post | Demo | Code | Citation | Related work: LLM-grounded Video Diffusion Models
LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at here.
This LMD+ model is based on Stable Diffusion v1.4 and integrates the adapters trained with GLIGEN. The model can be directly used with our LLMGroundedDiffusionPipeline, which is a simplified pipeline of LMD+ without per-box generation.
See the original SD Model Card here.
@article{lian2023llmgrounded,
title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}