MusicGen Small — ONNX (KV-Cache)

ONNX export of facebook/musicgen-small with KV-cache decoder for efficient on-device autoregressive generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-small`
Precision	FP32
Audio	Mono (1 channel)
Codebooks	4
Hidden Size	1024
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~3.6 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder (no KV-cache)	1.6 GB
`decoder_with_past_model.onnx`	Steps 1+ decoder (with KV-cache)	1.4 GB
`text_encoder.onnx`	T5 text encoder	419 MB
`encodec_decode.onnx`	EnCodec audio decoder	113 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

Usage

These models are designed for the DJNed Android app using ONNX Runtime. The KV-cache decoder pair enables O(1) per-step generation instead of O(n).

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt
Step 0: decoder_model.onnx generates the first token + initial KV-cache
Steps 1+: decoder_with_past_model.onnx generates subsequent tokens using KV-cache
Audio decode: encodec_decode.onnx converts codebook tokens to audio waveform

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

MusicGen Small — ONNX (KV-Cache)

ONNX export of facebook/musicgen-small with KV-cache decoder for efficient on-device autoregressive generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-small`
Precision	FP32
Audio	Mono (1 channel)
Codebooks	4
Hidden Size	1024
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~3.6 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder (no KV-cache)	1.6 GB
`decoder_with_past_model.onnx`	Steps 1+ decoder (with KV-cache)	1.4 GB
`text_encoder.onnx`	T5 text encoder	419 MB
`encodec_decode.onnx`	EnCodec audio decoder	113 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

Usage

These models are designed for the DJNed Android app using ONNX Runtime. The KV-cache decoder pair enables O(1) per-step generation instead of O(n).

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt

Step 0: decoder_model.onnx generates the first token + initial KV-cache

Steps 1+: decoder_with_past_model.onnx generates subsequent tokens using KV-cache

Audio decode: encodec_decode.onnx converts codebook tokens to audio waveform

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

musicgen small onnx