AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsmzr chapter audio dataset force aligned speecht5

mzr chapter audio dataset force aligned speecht5

by sil-ai

Open source · 1k downloads · 0 likes

0.0
(0 reviews)AudioAPI & Local
About

This model is a fine-tuned version of SpeechT5, specifically trained to generate synthetic voices from force-aligned audio chapters. It excels in text-to-speech conversion with natural intonation and precise phoneme synchronization, making it ideal for applications requiring expressive reading of long-form content like audiobooks or podcasts. Its primary use cases include professional voice-over creation, accessibility for the visually impaired, and automated audio content production from text. What sets it apart is its training on a specialized dataset of audio chapters, optimizing voice quality for narrative or literary contexts with notably low training loss.

Documentation

mzr-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training LossEpochStepValidation Loss
0.062612.501610000.0466
0.051725.020000.0436
0.052437.501630000.0428
0.050150.040000.0423
0.046462.501650000.0408
0.042275.060000.0421
0.047987.501670000.0416
0.0434100.080000.0425
0.0421112.501690000.0416
0.0408125.0100000.0424
0.0376137.5016110000.0438
0.0371150.0120000.0419
0.0377162.5016130000.0429
0.0377175.0140000.0422
0.0371187.5016150000.0427
0.0362200.0160000.0437
0.036212.5016170000.0438
0.0349225.0180000.0435
0.0356237.5016190000.0438
0.034250.0200000.0434
0.033262.5016210000.0437
0.0335275.0220000.0443
0.0329287.5016230000.0445
0.0332300.0240000.0448
0.0324312.5016250000.0449
0.0329325.0260000.0442
0.0317337.5016270000.0445
0.0311350.0280000.0443
0.0304362.5016290000.0448
0.0313375.0300000.0443
0.0308387.5016310000.0450
0.0312400.0320000.0447
0.0307412.5016330000.0448
0.0312425.0340000.0448
0.0304437.5016350000.0446
0.0313450.0360000.0448
0.0298462.5016370000.0446
0.0307475.0380000.0447
0.0302487.5016390000.0449
0.0303500.0400000.0448

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Capabilities & Tags
transformerssafetensorsspeecht5text-to-audiogenerated_from_trainerendpoints_compatible
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
0.0

Try mzr chapter audio dataset force aligned speecht5

Access the model directly