by nateraw
Open source · 727 downloads · 170 likes
MusicGen SongStarter v0.2 is an AI model specialized in generating melodies and musical ideas for producers. It produces stereo tracks at 32 kHz from text prompts describing styles, instruments, or atmospheres, providing creative foundations for composing music. This model stands out with its improved version, trained on a dataset three times larger and of higher quality than its predecessor, featuring a transformer model twice as large for richer and more nuanced results. Ideal for musicians looking to quickly explore musical concepts or overcome creative blocks, it serves as a practical tool for inspiration rather than final production. Its approach relies on manually selected and refined samples, ensuring stylistic consistency tailored to creators' needs.
musicgen-songstarter-v0.2 is a musicgen-stereo-melody-large fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
👀 Update: I wrote a blogpost detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.
Compared to musicgen-songstarter-v0.1, this new version:
medium ➡️ large transformer LMIf you find this model interesting, please consider:
Install audiocraft:
pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8) # generate 8 seconds.
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions) # generates 3 samples.
melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
Follow the following prompt format:
{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
For example:
hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
For some example tags, see the prompt format section of musicgen-songstarter-v0.1's readme. The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.
| Audio Prompt | Text Prompt | Output |
|---|---|---|
| trap, synthesizer, songstarters, dark, G# minor, 140 bpm | ||
| acoustic, guitar, melody, trap, D minor, 90 bpm |
For more verbose details, you can check out the blogpost.
This work would not have been possible without:
Thank you ❤️