| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by refulgentis 558 days ago
	Fascinating, thanks for calling that out: I found 1.0 promising in practice, but with hallucination problems. Then I saw it had gotten 57% of questions wrong on open book true/false and I wrote it off completely - no reason to switch to it for speed and cost if it's just a random generator. That's a great outcome.