Hacker News new | ask | show | jobs
by morgante 3560 days ago
I'm looking for formal rules. I'm not saying AAVE doesn't exist, I'm saying its rules (such as they are) are not formalized. Tons of rule books exist for Standard English and part of learning it is to memorize the correct rules. I don't believe any formal equivalent exists for AAVE.
2 comments

"Formal rules", in the context you've chosen to speak in, are defined by this upstream comment:

> NLP is not very good with standard English yet and usually doesn't generalize from topic to topic. Dialects and other languages - especially those without formal rules - will come when we can deal with standard English.

The rules you're talking about, that get printed in books and studied, are not linguistic rules. Crucially, this means they are not widely observed in printed standard English, which in turn means they can't be relevant to training a language model to understand printed standard English.

The "formality" you seem to want to talk about has no place in this discussion. It is not relevant to any language. gordonguthrie is correct to point out that the assumption lqdc13 is trying to make is false. You are wrong to contradict him using a meaning of "formal rules" that you brought to the conversation yourself. It had a meaning -- a completely unrelated meaning -- before you showed up.

> Crucially, this means they are not widely observed in printed standard English, which in turn means they can't be relevant to training a language model to understand printed standard English.

I agree that they're not widely observed in written English, but they are consistently observed in the WSJ, which was the origin of this entire debate.

As lqdc13 pointed out, NLP still isn't even consistently good at understanding standard English. One could reasonably posit that that's due to the inherent ambiguity and inconsistency of most writing and that focusing on a narrower, standardized document corpus (the WSJ) you could get better initial results. What, exactly, is controversial about that? Do you really think that the language of the WSJ is no more consistent and formalized than the language of Twitter users?

What, then, are your parameters for formal? It has to be in a book, codified and signed off on? If so, aren't these kind of antithetical to AAVE on the face of it?
> It has to be in a book, codified and signed off on?

The point is for rules to be formal then the must be formalized somehow and codified. This could be online or in a book, but the point is that there must be some clear delineation between when the rules are followed and when they are broken. Otherwise what's the meaning of "formal" rules?

> If so, aren't these kind of antithetical to AAVE on the face of it?

Yes. My argument is that AAVE, almost by definition, is an informal dialect without formalized rules.

This is getting rather off topic, but I think it relates to the original point of why NLP might start with Standard English even if you are not biased. A large corpus of Standard English text (such as from the WSJ) will generally be very internally consistent precisely because it follows a set of formal rules codified into a style guide. As there is no such equivalent for AAVE, even gathering a large and internally consistent corpus of AAVE text seems prohibitively difficult. That being said, I do hope researchers are working on gathering text from Twitter to build up new training sets.

The point is for rules to be formal then the must be formalized somehow and codified

So every form of English is an "informal dialect" then? Because this ain't French with the Académie publishing strict rules for use of the language. Do you also say that the languages of remote Amazon tribes aren't "real languages" because they don't have a formal government body publishing written rules?

Or do you just want to bash on AAVE and are grasping at straws for reasons why?

> So every form of English is an "informal dialect" then?

Yes, the majority of spoken English does not follow the rules of Standard English. Pretending that such rules don't exist is willful ignorance though: the WSJ obviously write a more formalized version of English than teenagers do in text messages.

> Do you also say that the languages of remote Amazon tribes aren't "real languages" because they don't have a formal government body publishing written rules?

Nowhere did I say that AAVE is "not a real language" because it's less formalized than Standard English. Prior to spelling reforms, English itself was extremely inconsistent and informal, but I certainly don't pretend that it wasn't a language.

> Or do you just want to bash on AAVE and are grasping at straws for reasons why?

I'm not trying to bash AAVE. In fact, I'd even posit that the reason AAVE isn't more codified is perhaps because of racial bias which treated it simply as "incorrect" English instead of a separate dialect worthy of formalization. Pretending that all languages are equally formalized is simply willful ignorance though.