by ACE-Step
Open source · 2k downloads · 46 likes
The ACE-Step Captioner is an AI model specialized in generating detailed and structured descriptions of musical content. It excels in analyzing styles, instruments, structures, and sound characteristics, delivering greater precision than solutions like Gemini Pro 2.5. With its rich vocabulary and ability to identify over 1,000 instruments and descriptive terms, it produces professional annotations tailored to diverse needs. This model is particularly useful for training music AI systems, creating metadata for audio databases, and music education. Its holistic approach makes it a versatile tool for documenting, analyzing, and categorizing music with remarkable nuance.
ACE-Step Captioner is the annotation model used by ACE-Step v1.5 for training data labeling. It is a professional-grade music captioning model that generates detailed, structured descriptions of audio content.
🏆 Accuracy surpasses Gemini Pro 2.5 in music description tasks
The usage is the same as Qwen2.5 Omni-7B.
Use the following prompt to caption audio:
*Task* Describe this audio in detail
<audio>
The model generates natural language descriptions covering multiple aspects of the music.
A melancholic indie folk track featuring fingerpicked acoustic guitar
as the primary instrument. The song opens with a sparse, contemplative
intro before the vocals enter with a breathy, intimate delivery.
The arrangement gradually builds through the verse, adding subtle
string pads and a gentle kick drum. The chorus lifts with layered
harmonies and a warmer, fuller texture. The bridge introduces a
key change and emotional climax before returning to the stripped-down
acoustic arrangement for the outro.
| Category | Styles |
|---|---|
| Electronic | Ambient, Techno, House, Drum & Bass, Synthwave, IDM, Downtempo |
| Rock | Alternative, Indie, Post-Rock, Progressive, Psychedelic, Grunge |
| Pop | Synth-pop, Electropop, Dream Pop, Art Pop, Indie Pop |
| Classical | Orchestral, Chamber, Minimalist, Neo-Classical, Cinematic |
| World | Latin, African, Middle Eastern, Asian Traditional, Celtic |
| Jazz | Fusion, Smooth, Bebop, Modal, Free Jazz |
| Hip-Hop | Trap, Boom Bap, Lo-fi, Instrumental, Cloud Rap |
| Category | Examples |
|---|---|
| Strings | Acoustic Guitar, Electric Guitar, Violin, Cello, Bass, Harp, Mandolin |
| Keys | Piano, Synthesizer, Organ, Rhodes, Wurlitzer, Mellotron |
| Percussion | Drums, Electronic Drums, Congas, Bongos, Timpani, Vibraphone |
| Wind | Saxophone, Trumpet, Flute, Clarinet, Oboe, French Horn |
| Electronic | Synth Bass, Pad, Lead, Arpeggiator, Sampler, 808, 303 |
| Dimension | Descriptors |
|---|---|
| Texture | Warm, Bright, Dark, Crisp, Muddy, Clean, Distorted, Saturated |
| Space | Reverberant, Dry, Spacious, Intimate, Cavernous, Tight |
| Dynamics | Punchy, Soft, Aggressive, Gentle, Compressed, Dynamic |
| Character | Ethereal, Gritty, Smooth, Raw, Polished, Organic, Synthetic |