Hacker News new | ask | show | jobs
by edg5000 5 days ago
Wow, looks like you've found a massive flaw indeed.

I was skeptical about the results because in my experience both recent GPT and Opus modules are strong. Everything else is B or C tier. This is just artisanal vibe testing though. It's very hard to eval them properly.