Hacker News new | ask | show | jobs
by jonstokes 3762 days ago
I'm a writer and editor, and I dislike the idea of this tool quite a bit.

1. Writing isn't coding. In coding, you can do various types of "cargo cult programming" and "copypasta" and what-have-you -- in other words, as long as the code runs you don't necessarily have to know why or how a programming idiom or convention works, or how/why expressing it one way in code is better than expressing it another way in code. This definitionally untrue with writing. If you don't know the why/how of something, then it's better for you to botch it and let the reader attempt to parse it so at least they know what they're dealing with and how to interpret it ("oh, this guy's a non-native speaker, so I'll adjust my reception accordingly" or "ah, this person is kind of clueless about the whole sexist language thing, which is good info for me.").

2. 90% of writing style advice falls into one of two categories: a) hotly debated, and b) totally wrong. Most of it is in the latter category, and this includes Strunk & White (just use google for numerous takedowns of that text). I looked through the PR queue and saw that it consists of eager coders finding style advice from various sources and trying to work that into the tool. That is terrible, terrible, terrible... This will guarantee that the tool will represent a collection of awful writing advice gleaned from dubious sources and wielded with unforgiving ignorance.

This tool may be a terrible idea, but the idea of automated prose linting is not terrible. Most beginner to intermediate writers have tics, and as an editor I often have a couple of writer-specific find/replace things I do when I get a new piece from a particular writer (e.g. "this person uses 'however' when she means 'but', and this person overuses these four business jargon terms, etc.). If editors were able to easily compose and execute writer-specific linters from within something like Wordpress, that would probably be pretty great.

But this particular command line tool is destined to be either totally unused or massively abused.

I'm sorry, I hate to be mean... or, actually, there is a small part of me that enjoys playing Mr. Party Pooper when I see a mob of enthusiastic programmers trying to tie down some great cultural Gulliver with a thousand tiny little automated, black-and-white rules.

7 comments

Thanks for the feedback. These are issues we've thought about, and we came to different conclusions:

re 2, you'll see at http://proselint.com/approach/ that one of the guiding principles of Proselint is that we defer to experts. In practice, that's meant almost all the advice comes from Bryan Garner's usage guide, Garner's Modern American Usage. He is a careful compiler of advice and you'll find that he is almost never "totally wrong", and when his advice is debated, he knows it, notes it, and provides a thoughtful discussion.

re 1, we think of Proselint as eventually being useful as a training tool, a way to learn the conventions. Note that natural languages are large, with so many low-frequency terms that nobody can learn the whole language. Why err if an automated tool can help? Consider for example demonyms, what you call people from a certain place. How many people know, for example, that people from Manchester are Mancunians, not Manchesterians? Rather than call someone by the wrong name, with Proselint the voice of an expert gently corrects you, and you learn a cool new word.

We aren't a mob of programmers, we are three people who love language, respect it, and think we're 2% of the way to making a great tool, one that The New Yorker could run over its stories to flag issues that its own editors would flag anyways. (In fact, we've done this, running Proselint over a corpus of highly vetted text, and have found numerous issues.)

Calling someone from Manchester a "Manchesterian" instead of "Mancunian" is not wrong, or even necessarily bad. Rather, it communicates something to the reader. Depending on the context, it could mean this person doesn't know that the correct term is "Mancunian", and did not look it up or even know that it should be looked up, all of which gives me useful info and context about the writer and their education level and the amount of effort they put into the piece and the amount of editing it underwent and so on. At the very least I can surmise that the writer is not a Mancunian. Or, it could mean that the writer is attempting to be clever.

Widespread use of proselint to correct this type of thing wouldn't improve writing. Rather, it would just add another interpretive option to the above range of scenarios, i.e. "ah, I can tell that this writer did or did not run that proselint tool before submission, because their text is or is not littered with boilerplate proselintisms."

The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.

> Calling someone from Manchester a "Manchesterian" instead of "Mancunian" is not wrong, or even necessarily bad. Rather, it communicates something to the reader. Depending on the context, it could mean this person doesn't know that the correct term is "Mancunian", and did not look it up or even know that it should be looked up, all of which gives me useful info and context about the writer and their education level and the amount of effort they put into the piece and the amount of editing it underwent and so on. At the very least I can surmise that the writer is not a Mancunian. Or, it could mean that the writer is attempting to be clever.

If the only goal of writing were to allow accurate assessment of the writer, then I would agree. But there are other reasons for writing — informing, persuading, clarifying, &c. — where writing clear, consistent, and idiomatic prose can help. Yours is a condemnation at all attempts to improve writing beyond the first-draft capabilities of the author.

> The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.

Agreed, Proselint is not the right tool to improve genuinely bad writing. Reading great authors and sweating through drafts is what we'd recommend to get better at the craft, too.

> all of which gives me useful info and context about the writer and their education level and the amount of effort they put into the piece and the amount of editing it underwent and so on.

From a reader centrist point of view, I can understand lamenting the loss of this information channel. From the author's stance, I can imagine wanting to tighten up alternate channels of information and present a clearer message. The author always has this ability, through natural circumstance, effort or research, so this tool would do nothing but make it easier. As a reader, it may change the assessment to whether they ran a proselint-like tool or not, but in the end those are just assumptions. The writer could be making specific choices to disregard the linting tool on purpose. In the end, reading is still an interpretive experience, this just allows authors more options.

> The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.

Generally good advice for any thing, but I think it's worth noting that different people learn in different ways, and providing more methods for learning is generally an improvement, and opens the field to more people. Tools that look to circumvent historical methods for achieving skill often face an uphill battle from those that used those historical methods. It's easy to see why, as it looks like it has devalued much of the hard work they put into their skills. This may be true to an extent, but the gains often far outweigh this, as making a skill accessible to more people has wide ranging benefits for society in general.

In more concrete terms, I see no reason why a tool like this can't be a multiplier for mentorship and practice. At the very least it enables exposure to ideas that might not have been encountered before.

Felt like someone should say this in this thread, but calling someone a "Manchesterian" offers no insight into anyone's education level, and I honestly don't even think it's something that we should be focusing corrections on. If anything, it would probably be nice if everyone started using "Manchesterian" instead of "Mancunian" because that seems a hell of a lot more clear to me ;)

To the library authors, Proselint looks very cool!

Do you have any linguists consulting / on staff?

Bryan Garner might be a careful compiler but doesn't seem to be a linguist and seems to be a traditionalist who makes simple errors.

e.g. http://itre.cis.upenn.edu/~myl/languagelog/archives/001869.h...

"His chapter is unfortunately full of repetitions of stupidities of the past tradition in English grammar — more of them than you could shake a stick at."

http://languagelog.ldc.upenn.edu/nll/?p=5630

"So why did Bryan Garner, a highly intelligent and insightful person, make this elementary error?"

http://www.arrantpedantry.com/2007/01/02/editing-chicago/

"A good editor should know that Bryan Garner’s take on the subject is misleading and incorrect. It’s become apparent to me that many of the self-appointed guardians of the language don’t even know what it is they’re guarding."

etc.etc.

You're implying that there is some kind of well-accepted notion of Bryan Garner being a poor guide to usage, but you link to some articles that are just nitpicking small terminology differences.

The second link in particular is tendentious. It claims Garner gives "a savage indictment of the behavior and character of those who use Stage 1 words [new usages]" in his book MAU.

But if you follow to the linked page from MAU, you read that Garner is, in an appendix, giving a series of wry analogies for the process of acceptance of new terms -- not a savage indictment at all. In other words, Garner is not himself saying all new usages have "a grade of F", etc., he's saying that is how some new usages will be perceived, in a very gross and qualitative sense, by a strict static conception of the language.

Since Garner comes right out and explicitly says all of the above, the link you cite comes off as picking a fight. There's nothing there.

Having read MAU (back in its first edition), I have to say that Garner strikes me as a very good guide to usage. I still enjoy perusing the book.

Taken as a whole, do you really have significant issues with MAU as a usage guide?

> some kind of well-accepted notion of Bryan Garner being a poor guide to usage

Wasn't my intention - merely pointing out that he's not a linguist and making simple errors should give anyone using him as an "authority" considerable pause.

> do you really have significant issues with MAU as a usage guide?

I am neither an American nor a linguist - which makes me doubly unqualified to comment. That I leave to experts.

^^ This right here, is exactly what I'm talking about.

Again, the idea of prose linting is not terrible, and in fact I do a hacked up version of it with a set of standard "find/replace" operations for specific writers who have specific issues. But a giant, general-purpose ball of rules of dubious provenance applied to a generic abstraction called "prose", is what I take issue with.

Garner's focus is on usage, not grammar, so for a usage linter, this doesn't seem like a big problem.

Is there an accessible, comprehensive, easy-to-read guide like Garner's Modern American Usage that's considered more accurate? There don't seem to be many options.

(I have a copy of GMAU and enjoy it, but mostly for discussion of usage, not the details of grammar)

> I dislike the idea of this tool quite a bit.

> This tool may be a terrible idea, but the idea of automated prose linting is not terrible.

So which is it? The idea of the tool is prose linting, and you've now stated both that you dislike it quite a bit, and that it's not terrible.

Part of what I think you may be missing, is that it doesn't need to be an all inclusive set of generally terrible, conflicting suggestions. With code style checkers we've already mostly solved this problem, by both storing metadata regarding the source of the rules, and allowing this metadata to be referenced when making custom rulesets. Perl::Critic[1] is a good example of this. It allows you to use the default ruleset and select a severity of criticism, or it allows an organisation (or individual) to create their own custom ruleset to enforce how they want their code to look.

Keeping this in mind, what if the default ruleset was curated to have select rules from multiple sources, but allowed you to easily take a source and use its rules? For example, if I want to write using Strunk & White today, that might be as easy as a command line flag, or downloading a specifically compiled ruleset. If I want to use something else, the same. If I want to make my own custom ruleset based on rules from multiple rulesets and a few of my own thrown in, that should be possible too.

1: https://en.wikipedia.org/wiki/Perl::Critic

It may not be universally applicable, and it may not be helpful to you in your work, but there is a spectrum of writing output requirements, and the tool (if well done) could be helpul in many situations.

1) Editing fiction by Terry Pratchett -- not too useful. 2) Editing a newspaper article -- maybe it would catch a few typo-level issues that crept in under deadline pressure, but a professional writer wouldn't lean on it. 3) A non-native speaker of English running meeting minutes through it before blasting out the e-mail -- that has a lot of utility. (Actually, the "I went to engineering school because I dislike writing." native speaker of English would benefit from linting that e-mail, too.)

>I'm a writer and editor, and I dislike the idea of this tool quite a bit.

You dislike this tool the same way welders dislike computer welding or the same way truck drivers will dislike automated driving.

Everyone wants to believe their job is so complex that a computer will never be able to perform the same task adequately. Is critiquing a sentence really as complex as driving a car in heavy traffic? Or playing chess? Or finding faces in photographs? Or winning on Jeopardy?

I don't believe that at all. That is total nonsense, in fact. My criticism is of this linting tool, not of artificial intelligence. I take it for granted that anything I can do, an AI will eventually be able to do much better. A linting tool is not an AI.
According to some definitions, AI is just "the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages." So by some definitions, a linter is absolutely an AI.

Semantics aside, it's not important to slm_HN's point. We can call it an AI, an algorithm, or just a computer, and in any case it's still possible for it to find errors beyond spelling ones.

"... there is a small part of me that enjoys playing Mr. Party Pooper when I see a mob of enthusiastic programmers trying to tie down some great cultural Gulliver with a thousand tiny little automated, black-and-white rules."

I'd reexamine that part, if I were you. I suspect it may be bigger than you think it is, especially since you've already pigeonholed the creators.

I'm a foreigner, who speaks English as a 3rd language, and I like the idea of this tool quite a bit.

You look like my wife complaining of GPS devices because sometimes they error or take us to dangerous places. It is just a took, you can just ignore its recommendations.

On the other hand, it could be used quite effectively as a "sanity check" occasionally. Just because it flags certain things doesn't mean you have to take its advice.
How would you suggest budding writers improve their skills then? This tool seems useful for that purpose to me.