|
|
|
|
|
by Robotenomics
883 days ago
|
|
Very, very impressive.. I ran a couple of tests and on the complex it received 80% although I would say it was harsh as the answer could be said to be correct- although I found the questions generated rather simple not complex. The 2nd test it was 100% incorrect for the complex questions! However when I checked directly with gpt-4 based upon the questions rendered it answered 100% correct. Could that be due to my custom settings in gpt4? Will run it with university students. Fascinating work |
|
I agree that the current grading is a bit harsh -- the rubric we're using in this demo is fairly rudimentary. What we've seen be more helpful is a range of grades along the lines of correct / correct but unhelpful / correct but incomplete / incorrect. This somewhat depends on individual use cases though.
Let me know what questions generated you thought could be more complex! We're always working on improving our ability to explore the knowledge space for challenging questions.