AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsfastspeech2 conformer with hifigan

fastspeech2 conformer with hifigan

by espnet

Open source · 735 downloads · 1 likes

0.4
(1 reviews)AudioAPI & Local
About

This model combines FastSpeech2Conformer, a non-autoregressive text-to-speech system, with the HiFi-GAN vocoder to transform text into high-quality speech. It leverages the Conformer architecture, which merges the efficiency of FastSpeech2 with the benefits of convolutional and attention networks, to rapidly generate mel spectrograms. Using HiFi-GAN, these spectrograms are then converted into natural, fluid audio waveforms. Ideal for applications requiring fast and realistic speech synthesis—such as voice assistants, audiobooks, or assistive communication tools for the visually impaired—it stands out for its speed and superior sound quality compared to traditional models.

Documentation

FastSpeech2ConformerWithHifiGan

This model combines FastSpeech2Conformer and FastSpeech2ConformerHifiGan into one model for a simpler and more convenient usage.

FastSpeech2Conformer is a non-autoregressive text-to-speech (TTS) model that combines the strengths of FastSpeech2 and the conformer architecture to generate high-quality speech from text quickly and efficiently, and the HiFi-GAN vocoder is used to turn generated mel-spectrograms into speech waveforms.

🤗 Transformers Usage

You can run FastSpeech2Conformer locally with the 🤗 Transformers library.

  1. First install the 🤗 Transformers library and g2p-en:
CSS
pip install --upgrade pip
pip install --upgrade transformers g2p-en
  1. Run inference via the Transformers modelling code with the model and hifigan combined
Python

from transformers import FastSpeech2ConformerTokenizer, FastSpeech2ConformerWithHifiGan
import soundfile as sf

tokenizer = FastSpeech2ConformerTokenizer.from_pretrained("espnet/fastspeech2_conformer")
inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
input_ids = inputs["input_ids"]

model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan")
output_dict = model(input_ids, return_dict=True)
waveform = output_dict["waveform"]

sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
Capabilities & Tags
transformerspytorchfastspeech2_conformer_with_hifigantext-to-audioenendpoints_compatible
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
0.4

Try fastspeech2 conformer with hifigan

Access the model directly