ACE-Step 1.5 XL — Base (4B DiT)

Model Details

This is the XL (4B) Base variant of ACE-Step 1.5 — a larger DiT decoder with ~4B parameters for higher audio quality. It is the foundation model supporting all tasks: text-to-music, cover, repaint, extract, lego, and complete.

XL Architecture

Parameter	Value
DiT Decoder hidden_size	2560
DiT Decoder layers	32
DiT Decoder attention heads	32
Encoder hidden_size	2048
Encoder layers	8
Total params	~4B
Weights size (bf16)	~18.8 GB
Inference steps	50 (with CFG)

GPU Requirements

VRAM	Support
≥12 GB	With CPU offload + INT8 quantization
≥16 GB	With CPU offload
≥20 GB	Without offload
≥24 GB	Full quality (XL + 4B LM)

All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.

Key Features

💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
🎯 Full Task Support: Text2Music, Cover, Repaint, Extract, Lego, Complete.
🔮 Higher Quality: 4B parameters provide richer audio quality compared to the 2B variants.

Quick Start

Bash

# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .

# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-base --local-dir ./checkpoints/acestep-v15-xl-base

# Run with Gradio UI
python acestep --config-path acestep-v15-xl-base

Model Zoo

XL (4B) DiT Models

DiT Model	CFG	Steps	Quality	Diversity	Tasks	Hugging Face	ModelScope
`acestep-v15-xl-base`	✅	50	High	High	All (extract, lego, complete)	This repo	Link
`acestep-v15-xl-sft`	✅	50	Very High	Medium	Standard	Link	Link
`acestep-v15-xl-turbo`	❌	8	Very High	Medium	Standard	Link	Link

2B DiT Models

DiT Model	CFG	Steps	Hugging Face	ModelScope
`acestep-v15-turbo` (default)	❌	8	Link	Link
`acestep-v15-sft`	✅	50	Link	Link
`acestep-v15-base`	✅	50	Link	Link

LM Models (all compatible with XL)

LM Model	Params	Audio Understanding	Composition	Hugging Face	ModelScope
`acestep-5Hz-lm-0.6B`	0.6B	Medium	Medium	Link	Link
`acestep-5Hz-lm-1.7B`	1.7B	Medium	Medium	Included in main	Included in main
`acestep-5Hz-lm-4B`	4B	Strong	Strong	Link	Link

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Citation

BibTeX

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}

ACE-Step 1.5 XL — Base (4B DiT)

Model Details

XL Architecture

Parameter	Value
DiT Decoder hidden_size	2560
DiT Decoder layers	32
DiT Decoder attention heads	32
Encoder hidden_size	2048
Encoder layers	8
Total params	~4B
Weights size (bf16)	~18.8 GB
Inference steps	50 (with CFG)

GPU Requirements

VRAM	Support
≥12 GB	With CPU offload + INT8 quantization
≥16 GB	With CPU offload
≥20 GB	Without offload
≥24 GB	Full quality (XL + 4B LM)

All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.

Key Features

💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
🎯 Full Task Support: Text2Music, Cover, Repaint, Extract, Lego, Complete.
🔮 Higher Quality: 4B parameters provide richer audio quality compared to the 2B variants.

Quick Start

Bash

# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .

# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-base --local-dir ./checkpoints/acestep-v15-xl-base

# Run with Gradio UI
python acestep --config-path acestep-v15-xl-base

Model Zoo

XL (4B) DiT Models

DiT Model	CFG	Steps	Quality	Diversity	Tasks	Hugging Face	ModelScope
`acestep-v15-xl-base`	✅	50	High	High	All (extract, lego, complete)	This repo	Link
`acestep-v15-xl-sft`	✅	50	Very High	Medium	Standard	Link	Link
`acestep-v15-xl-turbo`	❌	8	Very High	Medium	Standard	Link	Link

2B DiT Models

DiT Model	CFG	Steps	Hugging Face	ModelScope
`acestep-v15-turbo` (default)	❌	8	Link	Link
`acestep-v15-sft`	✅	50	Link	Link
`acestep-v15-base`	✅	50	Link	Link

LM Models (all compatible with XL)

LM Model	Params	Audio Understanding	Composition	Hugging Face	ModelScope
`acestep-5Hz-lm-0.6B`	0.6B	Medium	Medium	Link	Link
`acestep-5Hz-lm-1.7B`	1.7B	Medium	Medium	Included in main	Included in main
`acestep-5Hz-lm-4B`	4B	Strong	Strong	Link	Link

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Citation

BibTeX

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}

acestep v15 xl base

ACE-Step 1.5 XL — Base (4B DiT)

Model Details

XL Architecture

GPU Requirements

Key Features

Quick Start

Model Zoo

XL (4B) DiT Models

2B DiT Models

LM Models (all compatible with XL)

Acknowledgements

Citation

acestep v15 xl base

ACE-Step 1.5 XL — Base (4B DiT)

Model Details

XL Architecture

GPU Requirements

Key Features

Quick Start

Model Zoo

XL (4B) DiT Models

2B DiT Models

LM Models (all compatible with XL)

Acknowledgements

Citation