urdu-speecht5-finetuned

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8700

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 6
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 48
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 70
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
4.6416	1.5198	500	1.0342
4.2408	3.0395	1000	0.9665
4.0776	4.5593	1500	0.9377
4.0257	6.0790	2000	0.9284
3.9388	7.5988	2500	0.9060
3.8638	9.1185	3000	0.9002
3.8240	10.6383	3500	0.8884
3.7701	12.1581	4000	0.8894
3.7587	13.6778	4500	0.8772
3.7120	15.1976	5000	0.8787
3.6871	16.7173	5500	0.8724
3.6936	18.2371	6000	0.8732
3.6681	19.7568	6500	0.8782
3.6397	21.2766	7000	0.8798
3.6289	22.7964	7500	0.8654
3.6120	24.3161	8000	0.8669
3.6059	25.8359	8500	0.8608
3.5933	27.3556	9000	0.8610
3.5507	28.8754	9500	0.8674
3.5522	30.3951	10000	0.8633
3.5674	31.9149	10500	0.8654
3.5469	33.4347	11000	0.8605
3.5538	34.9544	11500	0.8577
3.5262	36.4742	12000	0.8677
3.5307	37.9939	12500	0.8621
3.5248	39.5137	13000	0.8601
3.5209	41.0334	13500	0.8564
3.5113	42.5532	14000	0.8597
3.5083	44.0729	14500	0.8650
3.5342	45.5927	15000	0.8595
3.4962	47.1125	15500	0.8660
3.4923	48.6322	16000	0.8640
3.4882	50.1520	16500	0.8669
3.4894	51.6717	17000	0.8677
3.4748	53.1915	17500	0.8645
3.4710	54.7112	18000	0.8662
3.4755	56.2310	18500	0.8673
3.4795	57.7508	19000	0.8628
3.4528	59.2705	19500	0.8697
3.4802	60.7903	20000	0.8746
3.4582	62.3100	20500	0.8695
3.4559	63.8298	21000	0.8697
3.4333	65.3495	21500	0.8690
3.4699	66.8693	22000	0.8696
3.4595	68.3891	22500	0.8700
3.4625	69.9088	23000	0.8700

Framework versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.8.3
Tokenizers 0.22.2

urdu-speecht5-finetuned

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8700

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05

train_batch_size: 6

eval_batch_size: 2

seed: 42

distributed_type: multi-GPU

num_devices: 2

gradient_accumulation_steps: 4

total_train_batch_size: 48

total_eval_batch_size: 4

optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments

lr_scheduler_type: linear

lr_scheduler_warmup_steps: 500

num_epochs: 70

mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
4.6416	1.5198	500	1.0342
4.2408	3.0395	1000	0.9665
4.0776	4.5593	1500	0.9377
4.0257	6.0790	2000	0.9284
3.9388	7.5988	2500	0.9060
3.8638	9.1185	3000	0.9002
3.8240	10.6383	3500	0.8884
3.7701	12.1581	4000	0.8894
3.7587	13.6778	4500	0.8772
3.7120	15.1976	5000	0.8787
3.6871	16.7173	5500	0.8724
3.6936	18.2371	6000	0.8732
3.6681	19.7568	6500	0.8782
3.6397	21.2766	7000	0.8798
3.6289	22.7964	7500	0.8654
3.6120	24.3161	8000	0.8669
3.6059	25.8359	8500	0.8608
3.5933	27.3556	9000	0.8610
3.5507	28.8754	9500	0.8674
3.5522	30.3951	10000	0.8633
3.5674	31.9149	10500	0.8654
3.5469	33.4347	11000	0.8605
3.5538	34.9544	11500	0.8577
3.5262	36.4742	12000	0.8677
3.5307	37.9939	12500	0.8621
3.5248	39.5137	13000	0.8601
3.5209	41.0334	13500	0.8564
3.5113	42.5532	14000	0.8597
3.5083	44.0729	14500	0.8650
3.5342	45.5927	15000	0.8595
3.4962	47.1125	15500	0.8660
3.4923	48.6322	16000	0.8640
3.4882	50.1520	16500	0.8669
3.4894	51.6717	17000	0.8677
3.4748	53.1915	17500	0.8645
3.4710	54.7112	18000	0.8662
3.4755	56.2310	18500	0.8673
3.4795	57.7508	19000	0.8628
3.4528	59.2705	19500	0.8697
3.4802	60.7903	20000	0.8746
3.4582	62.3100	20500	0.8695
3.4559	63.8298	21000	0.8697
3.4333	65.3495	21500	0.8690
3.4699	66.8693	22000	0.8696
3.4595	68.3891	22500	0.8700
3.4625	69.9088	23000	0.8700

Framework versions

Transformers 5.0.0

Pytorch 2.10.0+cu128

Datasets 4.8.3

Tokenizers 0.22.2

urdu speecht5 finetuned