|
|
|
|
|
by b33j0r
1136 days ago
|
|
Yep. To be clear, that’s the exact approach I’ve been pursuing. But then I see model context length getting longer and longer just within the transformer architecture and the training engineering going on. To me that’s a fundamentally different approach to AI research at this moment. It seems to keep paying off in surprising ways. |
|
Do you have any references to this? Seems really interesting if that can be a long term approach.