Hacker News new | ask | show | jobs
by zwaps 929 days ago
Excellent article. This suggests the Gpt scalings are like Rope scalings and one should not go beyond 2x original context length.

If Claude2 has an internal Rag, then this means also that the 200k context length only holds for queries that allow for an out of the box

Thanks for the insights!