Stable Audio Open 1.0 (Mæstræa Mirror)

Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz

Original Model by Stability AI · Stability AI Community License

This is an ungated mirror of the Stable Audio Open 1.0 model weights for use with Mæstræa AI Workstation. Only safetensors-format weights are included (legacy .ckpt files stripped). All credits go to the original authors.

What's in This Repo

Path	Description	Size
`model.safetensors`	Main model checkpoint	~3 GB
`transformer/diffusion_pytorch_model.safetensors`	DiT transformer	~1.5 GB
`text_encoder/model.safetensors`	T5 text encoder	~1.2 GB
`vae/diffusion_pytorch_model.safetensors`	VAE decoder	~150 MB
`projection_model/diffusion_pytorch_model.safetensors`	Projection model	~50 MB
`tokenizer/`	T5 tokenizer files	< 10 MB
`model_config.json`	Model architecture config	< 1 KB
`model_index.json`	Diffusers pipeline index	< 1 KB
`scheduler/`	Scheduler config	< 1 KB

What Stable Audio Open Does

Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:

Sound effects — Foley, impacts, transitions
Ambient textures — Rain, wind, crowds, environments
Musical textures — Pads, drones, atmospheric sounds
Audio scenes — Complex layered soundscapes

Up to 47 seconds of stereo audio per generation.

What It's NOT Good At

Full songs with vocals
High-fidelity musical instruments (use Foundation-1 for that)
Speech synthesis

VRAM Requirements

Minimum: ~4 GB (FP16)
Recommended: ~7 GB (FP16, longer durations)

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

Direct Usage (diffusers)

Python

from diffusers import StableAudioPipeline
import torch

pipe = StableAudioPipeline.from_pretrained(
    "AEmotionStudio/stable-audio-open-models",
    torch_dtype=torch.float16,
).to("cuda")

audio = pipe(
    prompt="Thunderstorm with heavy rain and distant rolling thunder",
    negative_prompt="low quality, distorted",
    audio_end_in_s=10.0,
    num_inference_steps=100,
).audios[0]

Using stable-audio-tools

Python

from stable_audio_tools import get_pretrained_model
model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")

License

Stability AI Community License — see LICENSE.md for full terms.

Key points:

Free for research and non-commercial use
Commercial use requires revenue < $1M/year or a separate license from Stability AI
Model outputs cannot be used to train competing models

Credits

Model: Stability AI
Paper: Stable Audio Open
Training Data: FreeSound + Free Music Archive (see attribution CSVs)
Mirror by: AEmotionStudio

Stable Audio Open 1.0 (Mæstræa Mirror)

Text-to-Audio SFX & Ambient Textures — Up to 47s Stereo @ 44.1kHz

Original Model by Stability AI · Stability AI Community License

This is an ungated mirror of the Stable Audio Open 1.0 model weights for use with Mæstræa AI Workstation. Only safetensors-format weights are included (legacy .ckpt files stripped). All credits go to the original authors.

What's in This Repo

Path	Description	Size
`model.safetensors`	Main model checkpoint	~3 GB
`transformer/diffusion_pytorch_model.safetensors`	DiT transformer	~1.5 GB
`text_encoder/model.safetensors`	T5 text encoder	~1.2 GB
`vae/diffusion_pytorch_model.safetensors`	VAE decoder	~150 MB
`projection_model/diffusion_pytorch_model.safetensors`	Projection model	~50 MB
`tokenizer/`	T5 tokenizer files	< 10 MB
`model_config.json`	Model architecture config	< 1 KB
`model_index.json`	Diffusers pipeline index	< 1 KB
`scheduler/`	Scheduler config	< 1 KB

What Stable Audio Open Does

Stable Audio Open generates stereo audio at 44.1kHz from text prompts. It excels at:

Sound effects — Foley, impacts, transitions
Ambient textures — Rain, wind, crowds, environments
Musical textures — Pads, drones, atmospheric sounds
Audio scenes — Complex layered soundscapes

Up to 47 seconds of stereo audio per generation.

What It's NOT Good At

Full songs with vocals
High-fidelity musical instruments (use Foundation-1 for that)
Speech synthesis

VRAM Requirements

Minimum: ~4 GB (FP16)
Recommended: ~7 GB (FP16, longer durations)

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend.

Direct Usage (diffusers)

Python

from diffusers import StableAudioPipeline
import torch

pipe = StableAudioPipeline.from_pretrained(
    "AEmotionStudio/stable-audio-open-models",
    torch_dtype=torch.float16,
).to("cuda")

audio = pipe(
    prompt="Thunderstorm with heavy rain and distant rolling thunder",
    negative_prompt="low quality, distorted",
    audio_end_in_s=10.0,
    num_inference_steps=100,
).audios[0]

Using stable-audio-tools

Python

from stable_audio_tools import get_pretrained_model
model, model_config = get_pretrained_model("AEmotionStudio/stable-audio-open-models")

License

Stability AI Community License — see LICENSE.md for full terms.

Key points:

Free for research and non-commercial use
Commercial use requires revenue < $1M/year or a separate license from Stability AI
Model outputs cannot be used to train competing models

Credits

Model: Stability AI
Paper: Stable Audio Open
Training Data: FreeSound + Free Music Archive (see attribution CSVs)
Mirror by: AEmotionStudio

stable audio open models

Stable Audio Open 1.0 (Mæstræa Mirror)

What's in This Repo

What Stable Audio Open Does

What It's NOT Good At

VRAM Requirements

Usage with Mæstræa

Direct Usage (diffusers)

Using stable-audio-tools

License

Credits

stable audio open models

Stable Audio Open 1.0 (Mæstræa Mirror)

What's in This Repo

What Stable Audio Open Does

What It's NOT Good At

VRAM Requirements

Usage with Mæstræa

Direct Usage (diffusers)

Using stable-audio-tools

License

Credits