| HN Mirror

They definitely have to be trained to reinforce longer outputs, but I do not believe this adequately explains the low-ish generation limits.

We are starting to see models with longer and longer generation limits (gpt-4o-mini having 16k, the o1 models going up to 64k), as well as longer and longer context limits (often 128k, google offering a million).

I find it very unlikely they are actually training with inputs or outputs near these maximums.

If you want to convince yourself, do the attention calculation math for these sequence lengths.

You can also see how openai restricts the sequence length for fine tuning to 64k - almost certainly bound by available GPU sizes

I suspect the 4096 limits have been set as a "reasonable" limit for a myriad of reasons.