|
|
|
|
|
by courseofaction
1081 days ago
|
|
This is the kind of info I've been looking for - I ran some informal experiments which asked ChatGPT to mark essays along various criteria analyzed how consistent the marking was. This was several months ago, GPT-4 performed quite well, but the data wasn't kept, (it was just an ad-hoc application test written in jupyter notebooks). I'm certain it's now doing significantly worse on the same tests, but alas I have lost the historical data to prove it. |
|