Hacker News new | ask | show | jobs
by foundzen 477 days ago
I got most of my answers from the README. Well written. I read most of it. Can you share what kind of resources (and how much of them) were required to fine tune Wav2Vec2-BERT?
1 comments

It takes about 45 minutes to do the current training run on an L4 GPU with these settings:

    # Training parameters
    "learning_rate": 5e-5,
    "num_epochs": 10,
    "train_batch_size": 12,
    "eval_batch_size": 32,
    "warmup_ratio": 0.2,
    "weight_decay": 0.05,

    # Evaluation parameters
    "eval_steps": 50,
    "save_steps": 50,
    "logging_steps": 5,

    # Model architecture parameters
    "num_frozen_layers": 20
I haven't seen a run do all 10 epochs, recently. There's usually an early stop after about 4 epochs.

The current data set size is ~8,000 samples.