| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by courseofaction 1081 days ago
	This is the kind of info I've been looking for - I ran some informal experiments which asked ChatGPT to mark essays along various criteria analyzed how consistent the marking was. This was several months ago, GPT-4 performed quite well, but the data wasn't kept, (it was just an ad-hoc application test written in jupyter notebooks). I'm certain it's now doing significantly worse on the same tests, but alas I have lost the historical data to prove it.