mzr-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4000
training_steps: 40000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.0626	12.5016	1000	0.0466
0.0517	25.0	2000	0.0436
0.0524	37.5016	3000	0.0428
0.0501	50.0	4000	0.0423
0.0464	62.5016	5000	0.0408
0.0422	75.0	6000	0.0421
0.0479	87.5016	7000	0.0416
0.0434	100.0	8000	0.0425
0.0421	112.5016	9000	0.0416
0.0408	125.0	10000	0.0424
0.0376	137.5016	11000	0.0438
0.0371	150.0	12000	0.0419
0.0377	162.5016	13000	0.0429
0.0377	175.0	14000	0.0422
0.0371	187.5016	15000	0.0427
0.0362	200.0	16000	0.0437
0.036	212.5016	17000	0.0438
0.0349	225.0	18000	0.0435
0.0356	237.5016	19000	0.0438
0.034	250.0	20000	0.0434
0.033	262.5016	21000	0.0437
0.0335	275.0	22000	0.0443
0.0329	287.5016	23000	0.0445
0.0332	300.0	24000	0.0448
0.0324	312.5016	25000	0.0449
0.0329	325.0	26000	0.0442
0.0317	337.5016	27000	0.0445
0.0311	350.0	28000	0.0443
0.0304	362.5016	29000	0.0448
0.0313	375.0	30000	0.0443
0.0308	387.5016	31000	0.0450
0.0312	400.0	32000	0.0447
0.0307	412.5016	33000	0.0448
0.0312	425.0	34000	0.0448
0.0304	437.5016	35000	0.0446
0.0313	450.0	36000	0.0448
0.0298	462.5016	37000	0.0446
0.0307	475.0	38000	0.0447
0.0302	487.5016	39000	0.0449
0.0303	500.0	40000	0.0448

Framework versions

Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

mzr-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001

train_batch_size: 8

eval_batch_size: 8

seed: 3407

gradient_accumulation_steps: 4

total_train_batch_size: 32

optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments

lr_scheduler_type: cosine

lr_scheduler_warmup_steps: 4000

training_steps: 40000

mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.0626	12.5016	1000	0.0466
0.0517	25.0	2000	0.0436
0.0524	37.5016	3000	0.0428
0.0501	50.0	4000	0.0423
0.0464	62.5016	5000	0.0408
0.0422	75.0	6000	0.0421
0.0479	87.5016	7000	0.0416
0.0434	100.0	8000	0.0425
0.0421	112.5016	9000	0.0416
0.0408	125.0	10000	0.0424
0.0376	137.5016	11000	0.0438
0.0371	150.0	12000	0.0419
0.0377	162.5016	13000	0.0429
0.0377	175.0	14000	0.0422
0.0371	187.5016	15000	0.0427
0.0362	200.0	16000	0.0437
0.036	212.5016	17000	0.0438
0.0349	225.0	18000	0.0435
0.0356	237.5016	19000	0.0438
0.034	250.0	20000	0.0434
0.033	262.5016	21000	0.0437
0.0335	275.0	22000	0.0443
0.0329	287.5016	23000	0.0445
0.0332	300.0	24000	0.0448
0.0324	312.5016	25000	0.0449
0.0329	325.0	26000	0.0442
0.0317	337.5016	27000	0.0445
0.0311	350.0	28000	0.0443
0.0304	362.5016	29000	0.0448
0.0313	375.0	30000	0.0443
0.0308	387.5016	31000	0.0450
0.0312	400.0	32000	0.0447
0.0307	412.5016	33000	0.0448
0.0312	425.0	34000	0.0448
0.0304	437.5016	35000	0.0446
0.0313	450.0	36000	0.0448
0.0298	462.5016	37000	0.0446
0.0307	475.0	38000	0.0447
0.0302	487.5016	39000	0.0449
0.0303	500.0	40000	0.0448

Framework versions

Transformers 4.57.1

Pytorch 2.8.0+cu128

Datasets 4.2.0

Tokenizers 0.22.1

mzr chapter audio dataset force aligned speecht5