AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMssenga nt asr inferred force aligned speecht5 MAT ACT

senga nt asr inferred force aligned speecht5 MAT ACT

by sil-ai

Open source · 249 downloads · 0 likes

0.0
(0 reviews)AudioAPI & Local
About

This model is a fine-tuned version of SpeechT5, specialized in automatic speech recognition (ASR) with forced alignment and inference. It converts audio recordings into accurately transcribed text by leveraging SpeechT5’s capabilities while optimizing its performance for transcription tasks. Its primary use cases include transcribing speeches, generating automatic subtitles, or analyzing audio content for professional or consumer applications. What sets it apart is its hybrid approach combining forced alignment and inference, which enhances synchronization between the audio and the generated text. It stands as a robust solution for transcription needs requiring both speed and reliability.

Documentation

senga-nt-asr-inferred-force-aligned-speecht5-MAT-ACT

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 600.0
  • mixed_precision_training: Native AMP

Training results

Training LossEpochStepValidation Loss
0.186930.303010000.1695
0.161260.606120000.1583
0.139990.909130000.1664
0.1301121.212140000.1640
0.1208151.515250000.1699
0.1161181.818260000.1746
0.108212.121270000.1673
0.0945242.424280000.1804
0.1044272.727390000.1787
0.0929303.0303100000.1756
0.0845333.3333110000.1701
0.0894363.6364120000.1739
0.0813393.9394130000.1667
0.0818424.2424140000.1740
0.0769454.5455150000.1719
0.0788484.8485160000.1780
0.0759515.1515170000.1745
0.0933545.4545180000.1754
0.0764575.7576190000.1760

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Capabilities & Tags
transformerssafetensorsspeecht5text-to-audiogenerated_from_trainerendpoints_compatible
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
0.0

Try senga nt asr inferred force aligned speecht5 MAT ACT

Access the model directly