Hacker News new | ask | show | jobs
by kmacdough 408 days ago
Keep in mind this release was never intended to prove superiority. Rather, it shows an alternative structure with some promising performance characteristics. More work needs to be done to show real application, but this very valuable learning.

That's part of the reason to compare against older, smaller models since they're at a more comparable stage of development.

1 comments

I agree. As I was trying to imply, I think if you integrated this structure into OpenAI’s or Claude’s stack, you’d get a vastly cheaper model that’s significantly faster with similar task performance (modulo the structural task performance parts that are hard to port to this new architecture). The point about quality was also intended to temper some of the excitement about the scores published on the page.