Hacker News new | ask | show | jobs
by fpgaminer 3692 days ago
One of the projects I'd love to develop is an automated peer editor for student essays. My wife is an english teacher and a large percentage of her time is taken up by grading papers. A large percentage of that time is then spent marking up grammar and spelling. What I envision is a website that handles that grammar/spelling bit. More importantly, I'd like it as a tool that the students use freely prior to submitting their essays to the teacher. I want them to have immediate feedback on how to improve the grammar in their essays, so they can iterate and learn. By the time the essays reach the teacher, the teacher should only have to grade for content, composition, style, plagiarism, citations, etc. Hopefully this also helps to reduce the amount of grammar that needs to be taught in-class, freeing time for more meaningful discussions.

The problem is that while I have knowledge and experience in the computer vision side of machine learning, I lack experience in NLP. And to the best of my knowledge NLP as a field has not come as far as vision, to the extent that such an automated editor would have too many mistakes. To be student facing it would need to be really accurate. On top of that it wouldn't be dealing with well formed input. The input by definition is adversarial. So unlike SyntaxNet which is built to deal with comprehensible sentences, this tool would need to deal with incomprehensible sentences. According to the link, SyntaxNet only gets 90% accuracy on random sentences from the web.

That said, I might give SyntaxNet a try. The idea would be to use SyntaxNet to extract meaning from a broken sentence, and then work backwards from the meaning to identify how the sentence can be modified to better match that meaning.

Thank you Google for contributing this tool to the community at large.

6 comments

I think this is still risky if used in a context where the student might think that the computer is somehow always right. Great English writers often deliberately use sentence fragments or puns, or use a word with a nonstandard part-of-speech interpretation (especially using a noun as a verb). They may also sometimes use a sentence that's difficult for readers to parse and then explain the ambiguity after the fact.

If a teacher gave students a grammar-checking tool to check their writing, they might assume that the tool knew better than they did, which is only sometimes true.

Those great writers that break all the rules still know them.

"Learn the rules like a pro, so you can break them like an artist."

-Picaso

This is often repeated, but there's no evidence it's true. Many great writers had no formal training.

"And that quote is almost certainly made up."

- Willem Shakespere

Knowing the rules is not the same thing as having formal training.

With then-innovated art like Cubism, which in some sense broke earlier rules, the point is that those artists, like Picasso, were able to do representative art in correct proportions -- they were just going beyond that.

This is a nontrivial issue, because there are always students who think they can skip learning boring mundane old fashioned art and go straight to something like cubism, but in 99% of cases that doesn't work well at all compared with learning "the rules" first.

With writing, many great writers have broken "the rules" with punctuation, spelling, grammar, etc. But the important thing is that they do so on purpose.

Whereas if one doesn't know the rules in the first place, one doesn't have the choice of whether to follow them or break them.

Such a person will always break those rules they don't know (subconscious knowledge counts btw) -- but not for aesthetic reasons, only out of ignorance.

My perhaps favourite example of knowing when and how to break the rules is Franz Schuberts "Erlkönig" [1] because it is so stark.

If you listen to it without paying attention to the text (based on a poem by Goethe by the same name; both the German text and an English translation is found at [1]), parts of it sounds like horrible jammering and poor harmonies and it's easy to write it off as not sounding very nice.

Here's [2] a much clearer rendition (two singers, with much stronger delineation of the three different characters) than the one linked from Britannica.

If you do pay attention to the text, it is very clear that the unpleasant parts are very deliberate:

The singer(s) switches between the role of a father, his sick dying child, and the Erl-king that occurs in the hallucinations of the child while the father is riding to bring the child to a doctor.

The big difference between the unpleasant-sounding parts of this song and a bad composer is the clear intent and delineation - Schubert made things sound bad intentionally explicitly at the points he wanted to illustrate pain and fear, rather than because he didn't know how to make things sound pleasant when he wanted to.

The song clearly proves this by setting the childs jammering and the fathers fearful attempts to soothe him up against much more pleasant segments where the Erl-king speaks and tries to seduce the child to come with him.

You only get that clear separation if you know how to evoke each effect precisely. Arguably a bad particularly composer wouldn't even know how to make things sound bad the "right way" - there's a big difference between random bad sounds and making things evoke a child in pain.

[1] http://www.britannica.com/topic/Erlkonig

[2] https://soundcloud.com/sean_contretenor_lee/erlkonig

Your logic is flawed. "Artist A did X before doing Y, therefore X is necessary to do Y". It doesn't compute.

I've heard great painters say that the only thing that matters is that you paint. Plenty of wonderful painters did not study their predecessors in depth.

I'm assuming that your 99% number is fabricated? Incidentally 99% of statistics are made up.

Previous discussion on “Know the rules well, so you can break them effectively” quote: https://news.ycombinator.com/item?id=7754905
I am also afraid of the misuse of such software. It is also possible that the teacher does not know that much and will look at the software as correct.

And come to think of it, isn't there a saying: Did stupidity require smart computers or did smart computers allow for stupidity?

An intelligent software (or one that pretends to be intelligent) might allow for any unqualified bloke to be in a position where they can teach.

Sorry for not being clear. This is intended for use in teaching English. What you describe falls under the purview of Creative Writing. Specifically it'd target English in grade school up until the early college college classes (the ones most people skip out of based on tests). After that, yes you'd be right and this tool wouldn't be appropriate.
Not sure how they work exactly, but have you looked at http://noredink.com (and as another commenter mentioned, http://grammarly.com/)? I'd be interested in your thoughts.
I've bounced my idea off my wife before and asked whether such a thing existed. I do recall her mentioning things like noredink and commenting that they weren't a match for one reason or another (the specific reasons escape me at the moment).

Visiting noredink.com now ... I can't actually figure out what it is the site offers from a cursory glance, so I can't even begin to figure out whether it matches my idea and what issues it may have had that excluded it from her classroom.

I'll bounce grammarly off her later though, in case that's new.

"A large percentage of that time is then spent marking up grammar and spelling."

As an aside, I don't think this is the optimal way to teach people how to write. What were the ideas in those papers? How were they organized? Do the student's arguments make sense? I think that's what most students spend most of their time thinking about when writing an essay, and it can be a bit demoralizing to see the teacher care just as much about whether the grammar was right. Most students can fix grammar mistakes relatively easily once they notice them anyway.

I don't think I remember actually being taught how to write in primary school. I think they should reteach grammar in high school from the beginning. Most people's brains just can't pick up a systematic treatment of some of the finer points when younger. I went and did all the grammar quizzes over at http://grammar.ccc.commnet.edu/grammar/quiz_list.htm a couple of years ago. It improved my writing a lot, and only took a weekend.
I honestly didn't learn how to write until I got to graduate school. My very patient adviser had to beat it into my skull. Writing is a very practical art...you need to practice at it a lot vs. studying and memorizing rules, and my pre-grad school education didn't really force that.
Can you tell what other resources you've used to learn, what practices were helpful in studying grammar, what was not helpful in your study of grammar?
> Most students can fix grammar mistakes relatively easily once they notice them anyway.

You would be surprised! Especially with people whose first languages aren't English. Having something to provide feedback on grammar problems early on would be really useful. Of course, once they get grammar down, the next step is rhythm and flow, as well as reducing redundancy, and the biggest problem, as you say, is always the story, but you have to peel through lots of grammar problems before you get to that point. I edit a lot of research papers for my Chinese peers (most have PhDs, I work in a China-based research lab, so that isn't weird), so I'm pretty clear on the problems.

Doesn't Grammarly[0] already do this? It analyzes the input for common grammar mistakes and proposes ways to fix them. As a student, I occasionally use Grammarly to proofread a paper for me, and it has worked pretty well so far.

[0]: http://grammarly.com

SyntaxNet is, by definition, for syntactic analysis - it would likely not help you much with semantics, to extract meaning. It could maybe help you automatically determine is a sentence is grammatically correct, though.
syntactic analysis is generally a precursor to semantic analysis

EDIT: but sure, this is only the first step and semantic parsing is far from solved

Such a checker could be a boon for students as well as instructors, but take note of this near the end of the article,

> This suggests that we are approaching human performance—but only on well-formed text.

It may fall down on exactly the bad writing you want to process. GIGO?