|
|
|
|
|
by zwaps
929 days ago
|
|
Excellent article. This suggests the Gpt scalings are like Rope scalings and one should not go beyond 2x original context length. If Claude2 has an internal Rag, then this means also that the 200k context length only holds for queries that allow for an out of the box Thanks for the insights! |
|