Hacker News new | ask | show | jobs
by easygenes 642 days ago
It is also a training issue. The model has to be trained to reinforce longer outputs, which has a quadratic train-time cost and requires suitable long-context response training data.
1 comments

They definitely have to be trained to reinforce longer outputs, but I do not believe this adequately explains the low-ish generation limits.

We are starting to see models with longer and longer generation limits (gpt-4o-mini having 16k, the o1 models going up to 64k), as well as longer and longer context limits (often 128k, google offering a million).

I find it very unlikely they are actually training with inputs or outputs near these maximums.

If you want to convince yourself, do the attention calculation math for these sequence lengths.

You can also see how openai restricts the sequence length for fine tuning to 64k - almost certainly bound by available GPU sizes

I suspect the 4096 limits have been set as a "reasonable" limit for a myriad of reasons.