Hacker News new | ask | show | jobs
by vlovich123 408 days ago
I just tried giving it a coding snippet that has a bug. ChatGPT & Claude found the bug instantly. Mercury fails to find it even after several reprompts (it's hallucinating). On the upside it is significantly faster. That's promising since the edge for ChatGPT and Claude are in the prolonged time and energy they've spent building training infrastructure, tooling, datasets, etc to pump out models with high task performance.
1 comments

Keep in mind this release was never intended to prove superiority. Rather, it shows an alternative structure with some promising performance characteristics. More work needs to be done to show real application, but this very valuable learning.

That's part of the reason to compare against older, smaller models since they're at a more comparable stage of development.

I agree. As I was trying to imply, I think if you integrated this structure into OpenAI’s or Claude’s stack, you’d get a vastly cheaper model that’s significantly faster with similar task performance (modulo the structural task performance parts that are hard to port to this new architecture). The point about quality was also intended to temper some of the excitement about the scores published on the page.