|
|
|
|
|
by regularfry
968 days ago
|
|
It's not the inference, it's the training. They say in the paper: "We train with a batch size of 256 for a total of 80,000 optimisation steps, which amounts to eight epochs of training." That's a fair chunk of time. Mind you, `small.en` has smaller decoder layers than `medium.en`... |
|