by schalor
Open source · 486 downloads · 0 likes
This model is a refined version of *microsoft/speecht5_tts*, specifically optimized for speech synthesis. It converts text into natural and expressive speech, with improved sound quality thanks to training on specialized data. Its primary use cases include creating voiceovers, assisting visually impaired individuals, and generating automated audio content. What sets it apart is its ability to produce more natural intonation tailored to various contexts while maintaining the robustness of the base model.
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 1.1364 | 100 | 0.7136 |
| 0.826 | 2.2727 | 200 | 0.5898 |
| 0.826 | 3.4091 | 300 | 0.5733 |
| 0.6254 | 4.5455 | 400 | 0.5604 |
| 0.6254 | 5.6818 | 500 | 0.5542 |
| 0.603 | 6.8182 | 600 | 0.5490 |
| 0.603 | 7.9545 | 700 | 0.5450 |
| 0.5924 | 9.0909 | 800 | 0.5432 |
| 0.5924 | 10.2273 | 900 | 0.5403 |
| 0.5841 | 11.3636 | 1000 | 0.5378 |
| 0.5841 | 12.5 | 1100 | 0.5336 |
| 0.578 | 13.6364 | 1200 | 0.5357 |
| 0.578 | 14.7727 | 1300 | 0.5321 |
| 0.5724 | 15.9091 | 1400 | 0.5293 |
| 0.5724 | 17.0455 | 1500 | 0.5287 |
| 0.5704 | 18.1818 | 1600 | 0.5272 |
| 0.5704 | 19.3182 | 1700 | 0.5281 |
| 0.5653 | 20.4545 | 1800 | 0.5239 |
| 0.5653 | 21.5909 | 1900 | 0.5276 |
| 0.5623 | 22.7273 | 2000 | 0.5260 |
| 0.5623 | 23.8636 | 2100 | 0.5233 |
| 0.5628 | 25.0 | 2200 | 0.5239 |