Hacker News new | ask | show | jobs
by robinwassen 2478 days ago
I am working with reducing the time teachers spend on exams and assessments. I have access to a cleaned and manually scored dataset of 550k essays that is exponentially growing. Looked at creating at a model based on this dataset to automatically score essays with NLP parameters such as grammar, structure, spelling, word complexity, sentiment, relative text length etc.

The problem that I encountered was actually how to apply it in a useful way, since the problems mentioned in the article are quite obvious when you design the model.

Options that I saw:

1. Use it as autonomous grading with optional review by the teacher, see the linked article for the problems with this.

2. Use it as a sanity check on the teachers manual scoring, but it would not reduce the work load and probably just undermine the teacher.

Do you have any suggestions for how such a model could be applied in a practical and ethical way?

Had some thoughts on how to measure actual knowledge about a subject, but that would require a massive knowledge graph which would introduce a huge amount of complexity just to see if it would be a feasible approach.

2 comments

Here are some thoughts: 1. Instead of grading, maybe you can use it for training, tutoring. If a student is learning to write essays, I'm assuming it's hard for them to get any feedback. 2. But then there's probably not enough money to be earned there.

One trick might be to write an independent AI to summarize the essay back and see how closely it matches the essay title. This might weed out gibberish essays with sound English sentences.

Current Transformer models are looking pretty good at complex end-to-end tasks (at least, better than the shallow regression with hand-picked features that ETS probably uses). In a few years, complete end-to-end evaluation may not be so impossible, especially with so much data.