Hacker News new | ask | show | jobs
by seangrogg 789 days ago
Valid! I think the disparity is that the article appears to be written for a fairly technical crowd but the expectations appear to come from what these particular models are marketing. Most that are fine-tuning LLMs or aware of LongRoPe for extending context windows are probably consumers of research/white papers rather than marketing material.

Having read some of your other comments it appears that part of the issue is that you were marketed a 1 million token context window and research has shown that's not quite the case. That said, the article doesn't do a good job of painting that picture - it is alluded to with "all fail at this task despite having big context windows" but I think it's worth being crystal clear here that the marketing says 1m and that is disingenuous in your experience and backed by research findings.