ACE Step v1 chinese rap LoRA

Name: ACE Step v1 chinese rap LoRA
Rating: 1.9 (34 reviews)

by ACE-Step

Open source · 307 downloads · 34 likes

1.9

(34 reviews)AudioAPI & Local

About

ACE Step v1 Chinese Rap LoRA is a specialized model designed for generating Mandarin rap vocals. Trained on carefully curated and cleaned datasets, it captures the stylistic nuances of Chinese hip-hop and electronic music. The model excels in precise Mandarin phonetics, faithfully reproducing rap vocal techniques such as *mumble rap* or *trap flow*, while offering a wide range of expressive outputs—from melodic flows to stylized effects. It enables users to create original tracks, enhance existing productions with underground or experimental influences, or blend Chinese rap with other musical genres for richer, more detailed results. What sets it apart is its hybrid approach, combining a strong musical foundation (via ACE-Step) with precise vocal controls. This allows users to fine-tune parameters like timbre, clarity, or delivery techniques to tailor the output to their creative vision. Though tailored for Chinese rap, the model also demonstrates ACE-Step’s universal potential as a music generation tool, capable of transcending linguistic and cultural barriers to inspire new forms of artistic expression.

Documentation

🎤 Chinese Rap LoRA for ACE-Step (Rap Machine)

This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:

Improved Chinese pronunciation accuracy
Enhanced stylistic adherence to hip-hop and electronic genres
Greater diversity in hip-hop vocal expressions

Audio Examples see: https://ace-step.github.io/#RapMachine

Usage Guide

Generate higher-quality Chinese songs
Create superior hip-hop tracks
Blend with other genres to:
- Produce music with better vocal quality and detail
- Add experimental flavors (e.g., underground, street culture)
Fine-tune using these parameters:

Vocal Controls
vocal_timbre

Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.
Describes inherent vocal qualities.

techniques (List)

Rap styles: mumble rap, chopper rap, melodic rap, lyrical rap, trap flow, double-time rap
Vocal FX: auto-tune, reverb, delay, distortion
Delivery: whispered, shouted, spoken word, narration, singing
Other: ad-libs, call-and-response, harmonized

Community Note

While a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles.

The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.

ACE-Step: A Step Towards Music Generation Foundation Model

ACE-Step Framework

Model Description

ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.

Key Features:

15× faster than LLM-based baselines (20s for 4-minute music on A100)
Superior musical coherence across melody, harmony, and rhythm
full-song generation, duration control and accepts natural language descriptions

Uses

Direct Use

ACE-Step can be used for:

Generating original music from text descriptions
Music remixing and style transfer
edit song lyrics

Downstream Use

The model serves as a foundation for:

Voice cloning applications
Specialized music generation (rap, jazz, etc.)
Music production tools
Creative AI assistants

Out-of-Scope Use

The model should not be used for:

Generating copyrighted content without permission
Creating harmful or offensive content
Misrepresenting AI-generated music as human-created

How to Get Started

see: https://github.com/ace-step/ACE-Step

Hardware Performance

Device	27 Steps	60 Steps
NVIDIA A100	27.27x	12.27x
RTX 4090	34.48x	15.63x
RTX 3090	12.76x	6.48x
M2 Max	2.27x	1.03x

RTF (Real-Time Factor) shown - higher values indicate faster generation

Limitations

Performance varies by language (top 10 languages perform best)
Longer generations (>5 minutes) may lose structural coherence
Rare instruments may not render perfectly
Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
Continuity Artifacts: Unnatural transitions in repainting/extend operations
Vocal Quality: Coarse vocal synthesis lacking nuance
Control Granularity: Needs finer-grained musical parameter control

Ethical Considerations

Users should:

Verify originality of generated works
Disclose AI involvement
Respect cultural elements and copyrights
Avoid harmful content generation

Model Details

Developed by: ACE Studio and StepFun
Model type: Diffusion-based music generation with transformer conditioning
License: Apache 2.0
Resources:

Citation

Bibtex

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Capabilities & Tags

diffusersmusictext2musictext-to-audioenzhdefresit

Links & Resources