AI/EXPLORER
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium
—Outils IA
—Sites & Blogs
—LLMs & Modèles
—Catégories
AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • ›Tous les outils
  • ›Sites & Blogs
  • ›LLMs & Modèles
  • ›Comparer
  • ›Chatbots
  • ›Images IA
  • ›Code & Dev

Entreprise

  • ›Premium
  • ›À propos
  • ›Contact
  • ›Blog

Légal

  • ›Mentions légales
  • ›Confidentialité
  • ›CGV

© 2026 AI Explorer·Tous droits réservés.

AccueilLLMsmustango

mustango

par declare-lab

Open source · 3k downloads · 41 likes

2.0
(41 avis)AudioAPI & Local
À propos

Mustango est un modèle d'intelligence artificielle spécialisé dans la génération contrôlée de musique à partir de texte. Il combine des techniques avancées comme les modèles de diffusion latente et des architectures de langage pour produire des morceaux musicaux adaptés à des descriptions textuelles précises. Le modèle se distingue par sa capacité à intégrer des caractéristiques musicales spécifiques, offrant ainsi une grande flexibilité dans la création de compositions. Il est particulièrement utile pour les musiciens, les créateurs de contenu ou les chercheurs souhaitant explorer de nouvelles formes d'expression musicale assistée par IA. Mustango se positionne comme un outil innovant pour transformer des idées textuelles en œuvres musicales cohérentes et personnalisables.

Documentation

Mustango: Toward Controllable Text-to-Music Generation

Demo | Model | Website and Examples | Paper | Dataset

Hugging Face Spaces

Meet Mustango, an exciting addition to the vibrant landscape of Multimodal Large Language Models designed for controlled music generation. Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to do the magic!

🔥 Live demo available on Replicate and HuggingFace.

Quickstart Guide

Generate music from a text prompt:

Python
import IPython
import soundfile as sf
from mustango import Mustango

model = Mustango("declare-lab/mustango")

prompt = "This is a new age piece. There is a flute playing the main melody with a lot of staccato notes. The rhythmic background consists of a medium tempo electronic drum beat with percussive elements all over the spectrum. There is a playful atmosphere to the piece. This piece can be used in the soundtrack of a children's TV show or an advertisement jingle."

music = model.generate(prompt)
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)

Installation

Bash
git clone https://github.com/AMAAI-Lab/mustango
cd mustango
pip install -r requirements.txt
cd diffusers
pip install -e .

Datasets

The MusicBench dataset contains 52k music fragments with a rich music-specific text caption.

Subjective Evaluation by Expert Listeners

ModelDatasetPre-trainedOverall Match ↑Chord Match ↑Tempo Match ↑Audio Quality ↑Musicality ↑Rhythmic Presence and Stability ↑Harmony and Consonance ↑
TangoMusicCaps✓4.352.753.883.352.833.953.84
TangoMusicBench✓4.913.613.863.883.544.014.34
MustangoMusicBench✓5.495.764.984.304.284.655.18
MustangoMusicBench✗5.756.065.114.804.804.755.59

Training

We use the accelerate package from Hugging Face for multi-gpu training. Run accelerate config from terminal and set up your run configuration by the answering the questions asked.

You can now train Mustango on the MusicBench dataset using:

Bash
accelerate launch train.py \
--text_encoder_name="google/flan-t5-large" \
--scheduler_name="stabilityai/stable-diffusion-2-1" \
--unet_model_config="configs/diffusion_model_config_munet.json" \
--model_type Mustango --freeze_text_encoder --uncondition_all --uncondition_single \
--drop_sentences --random_pick_text_column --snr_gamma 5 \

The --model_type flag allows to choose either Mustango, or Tango to be trained with the same code. However, do note that you also need to change --unet_model_config to the relevant config: diffusion_model_config_munet for Mustango; diffusion_model_config for Tango.

The arguments --uncondition_all, --uncondition_single, --drop_sentences control the dropout functions as per Section 5.2 in our paper. The argument of --random_pick_text_column allows to randomly pick between two input text prompts - in the case of MusicBench, we pick between ChatGPT rephrased captions and original enhanced MusicCaps prompts, as depicted in Figure 1 in our paper.

Recommended training time from scratch on MusicBench is at least 40 epochs.

Model Zoo

We have released the following models:

Mustango Pretrained: https://huggingface.co/declare-lab/mustango-pretrained

Mustango: https://huggingface.co/declare-lab/mustango

Citation

Please consider citing the following article if you found our work useful:

INI
@misc{melechovsky2023mustango,
      title={Mustango: Toward Controllable Text-to-Music Generation}, 
      author={Jan Melechovsky and Zixun Guo and Deepanway Ghosal and Navonil Majumder and Dorien Herremans and Soujanya Poria},
      year={2023},
      eprint={2311.08355},
      archivePrefix={arXiv},
}
Liens & Ressources
Spécifications
CatégorieAudio
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Note
2.0

Essayer mustango

Accédez directement au modèle