AI/EXPLORER
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium
—Outils IA
—Sites & Blogs
—LLMs & Modèles
—Catégories
AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • ›Tous les outils
  • ›Sites & Blogs
  • ›LLMs & Modèles
  • ›Comparer
  • ›Chatbots
  • ›Images IA
  • ›Code & Dev

Entreprise

  • ›Premium
  • ›À propos
  • ›Contact
  • ›Blog

Légal

  • ›Mentions légales
  • ›Confidentialité
  • ›CGV

© 2026 AI Explorer·Tous droits réservés.

AccueilLLMsriffusion model v1

riffusion model v1

par riffusion

Open source · 2k downloads · 648 likes

3.5
(648 avis)AudioAPI & Local
À propos

Le modèle Riffusion v1 est un outil innovant de génération musicale en temps réel, capable de transformer des descriptions textuelles en spectrogrammes visuels, puis en clips audio. Il s’appuie sur une version fine-tunée de Stable Diffusion, spécialisée dans l’interprétation de prompts musicaux pour créer des paysages sonores ou des mélodies adaptées à des ambiances ou des styles spécifiques. Idéal pour les artistes, les créateurs de contenu ou les passionnés de musique, il permet d’explorer rapidement des idées sonores sans compétences techniques en composition. Ce qui le distingue, c’est sa capacité à générer des résultats cohérents et esthétiques à partir de simples instructions textuelles, tout en offrant une flexibilité pour l’expérimentation créative. Accessible via une application web ou des outils dédiés, il ouvre des perspectives pour l’éducation musicale, la production audio ou simplement le plaisir de créer.

Documentation

Riffusion

Riffusion is an app for real-time music generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

  • Code: https://github.com/riffusion/riffusion
  • Web app: https://github.com/hmartiro/riffusion-app
  • Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1
  • Discord: https://discord.gg/yu6SRwvX4v

This repository contains the model files, including:

  • a diffusers formated library
  • a compiled checkpoint file
  • a traced unet for improved inference speed
  • a seed image library for use with riffusion-app

Riffusion v1 Model

Riffusion is a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be converted into audio clips.

The model was created by Seth Forsgren and Hayk Martiros as a hobby project.

You can use the Riffusion model directly, or try the Riffusion web app.

The Riffusion model was created by fine-tuning the Stable-Diffusion-v1-5 checkpoint. Read about Stable Diffusion here 🤗's Stable Diffusion blog.

Model Details

  • Developed by: Seth Forsgren, Hayk Martiros
  • Model type: Diffusion-based text-to-image generation model
  • Language(s): English
  • License: The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
  • Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper.

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include

  • Generation of artworks, audio, and use in creative processes.
  • Applications in educational or creative tools.
  • Research on generative models.

Datasets

The original Stable Diffusion v1.5 was trained on the LAION-5B dataset using the CLIP text encoder, which provided an amazing starting point with an in-depth understanding of language, including musical concepts. The team at LAION also compiled a fantastic audio dataset from many general, speech, and music sources that we recommend at LAION-AI/audio-dataset.

Fine Tuning

Check out the diffusers training examples from Hugging Face. Fine tuning requires a dataset of spectrogram images of short audio clips, with associated text describing them. Note that the CLIP encoder is able to understand and connect many words even if they never appear in the dataset. It is also possible to use a dreambooth method to get custom styles.

Citation

If you build on this work, please cite it as follows:

INI
@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}
Liens & Ressources
Spécifications
CatégorieAudio
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Note
3.5

Essayer riffusion model v1

Accédez directement au modèle