| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by matt_lee 926 days ago

Thank you for the kind words!

One of my regrets about the demo is that we paid a lot of attention to showing off our ability to generate high quality Q/A pairs, but not nearly as much to showing what a thoughtful and thorough grading rubric can do.

It's totally possible to do a high quality grading given a rubric that sets expectations! Great implementations we've seen use categories like correct / correct but incomplete / correct but unhelpful / incorrect to better label the situation you describe. We've found that we can grade with much more nuance given a good rubric and categories, but unfortunately didn't focus on that side of things in the demo

I'm not familiar with wikicrow, will check it out!