|
|
|
|
|
by SOLAR_FIELDS
264 days ago
|
|
Context is often not the only issue. Really the issue is attention - context is a factor in how well the LLM handles attention to the broad scope of a task, but one can anecdotally easily observe the thing forget or go off the rails when only a fraction of the context window is being used. Oftentimes it’s effective to just say “don’t ever go above 20% of the max” |
|
RoPE is great and all, but doesn't magically give 100% performance over the lengthened context; that takes more work.