| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by edg5000 5 days ago
	Wow, looks like you've found a massive flaw indeed. I was skeptical about the results because in my experience both recent GPT and Opus modules are strong. Everything else is B or C tier. This is just artisanal vibe testing though. It's very hard to eval them properly.