Hacker News new | ask | show | jobs
by wenhan_zhou 2 days ago
Yep. Or even better, compact after a random number of turns. The model must then learn to preserve useful context at arbitrary context lengths.