by AEmotionStudio
Open source · 155 downloads · 0 likes
Stable Audio Open is an audio generation model that converts text descriptions into stereo sound effects and ambient textures, with a maximum duration of 47 seconds at 44.1 kHz. It excels particularly in creating realistic sounds such as footsteps, impacts, ambiences (rain, wind), or complex soundscapes, while also producing atmospheric musical textures like pads or drones. Unlike other models, it does not generate full songs with vocals, high-fidelity musical instruments, or speech synthesis, focusing instead on creative and immersive uses. Available under a community license, it is ideal for artists, developers, or content creators looking to enrich their projects with unique and varied sounds. Its simplified integration through tools like Mæstræa makes it especially practical for immediate use.
Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz
Original Model by Stability AI · Stability AI Community License
This is an ungated mirror of the Stable Audio Open 1.0 model weights for use with Mæstræa AI Workstation. Only safetensors-format weights are included (legacy
.ckptfiles stripped). All credits go to the original authors.
| Path | Description | Size |
|---|---|---|
model.safetensors | Main model checkpoint | ~3 GB |
transformer/diffusion_pytorch_model.safetensors | DiT transformer | ~1.5 GB |
text_encoder/model.safetensors | T5 text encoder | ~1.2 GB |
vae/diffusion_pytorch_model.safetensors | VAE decoder | ~150 MB |
projection_model/diffusion_pytorch_model.safetensors | Projection model | ~50 MB |
tokenizer/ | T5 tokenizer files | < 10 MB |
model_config.json | Model architecture config | < 1 KB |
model_index.json | Diffusers pipeline index | < 1 KB |
scheduler/ | Scheduler config | < 1 KB |
Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:
Up to 47 seconds of stereo audio per generation.
These models are automatically downloaded by the Mæstræa AI Workstation backend.
from diffusers import StableAudioPipeline
import torch
pipe = StableAudioPipeline.from_pretrained(
"AEmotionStudio/stable-audio-open-models",
torch_dtype=torch.float16,
).to("cuda")
audio = pipe(
prompt="Thunderstorm with heavy rain and distant rolling thunder",
negative_prompt="low quality, distorted",
audio_end_in_s=10.0,
num_inference_steps=100,
).audios[0]
from stable_audio_tools import get_pretrained_model
model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")
Stability AI Community License — see LICENSE.md for full terms.
Key points: