by ACE-Step
Open source · 307 downloads · 34 likes
ACE Step v1 Chinese Rap LoRA is a specialized model designed for generating Mandarin rap vocals. Trained on carefully curated and cleaned datasets, it captures the stylistic nuances of Chinese hip-hop and electronic music. The model excels in precise Mandarin phonetics, faithfully reproducing rap vocal techniques such as *mumble rap* or *trap flow*, while offering a wide range of expressive outputs—from melodic flows to stylized effects. It enables users to create original tracks, enhance existing productions with underground or experimental influences, or blend Chinese rap with other musical genres for richer, more detailed results. What sets it apart is its hybrid approach, combining a strong musical foundation (via ACE-Step) with precise vocal controls. This allows users to fine-tune parameters like timbre, clarity, or delivery techniques to tailor the output to their creative vision. Though tailored for Chinese rap, the model also demonstrates ACE-Step’s universal potential as a music generation tool, capable of transcending linguistic and cultural barriers to inspire new forms of artistic expression.
This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:
Audio Examples see: https://ace-step.github.io/#RapMachine
Vocal Controls
vocal_timbre
techniques (List)
mumble rap, chopper rap, melodic rap, lyrical rap, trap flow, double-time rapauto-tune, reverb, delay, distortionwhispered, shouted, spoken word, narration, singingad-libs, call-and-response, harmonizedWhile a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles.
The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.

ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
Key Features:
ACE-Step can be used for:
The model serves as a foundation for:
The model should not be used for:
see: https://github.com/ace-step/ACE-Step
| Device | 27 Steps | 60 Steps |
|---|---|---|
| NVIDIA A100 | 27.27x | 12.27x |
| RTX 4090 | 34.48x | 15.63x |
| RTX 3090 | 12.76x | 6.48x |
| M2 Max | 2.27x | 1.03x |
RTF (Real-Time Factor) shown - higher values indicate faster generation
Users should:
Developed by: ACE Studio and StepFun
Model type: Diffusion-based music generation with transformer conditioning
License: Apache 2.0
Resources:
@misc{gong2025acestep,
title={ACE-Step: A Step Towards Music Generation Foundation Model},
author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step}},
year={2025},
note={GitHub repository}
}
This project is co-led by ACE Studio and StepFun.