|
|
|
|
|
by kwindla
468 days ago
|
|
It takes about 45 minutes to do the current training run on an L4 GPU with these settings: # Training parameters
"learning_rate": 5e-5,
"num_epochs": 10,
"train_batch_size": 12,
"eval_batch_size": 32,
"warmup_ratio": 0.2,
"weight_decay": 0.05,
# Evaluation parameters
"eval_steps": 50,
"save_steps": 50,
"logging_steps": 5,
# Model architecture parameters
"num_frozen_layers": 20
I haven't seen a run do all 10 epochs, recently. There's usually an early stop after about 4 epochs.The current data set size is ~8,000 samples. |
|