Hacker News new | ask | show | jobs
by anonymousguy 3604 days ago
Small child have to learn language from nothing. They just figure it out through exposure and practice. Even pets learn some language. This is the model to emulate.

Ultimately language use requires a few skills:

* a good parser * motor cognition/coordination * a good memory * semantics/context * vocabulary * situational awareness

The first two in the list are what small children struggle with the most. Fortunately, we can eliminate motor coordination as a need for AI. Although extremely powerful parsers demand a specialized expertise to produce this part of the problem is straight forward. I write open source multi-language/multi-dialect parsers as an open source hobby.

I discount vocabulary and situational awareness, because most children still haven't figure this out until they enter high school long after they have learned the basics of speech. That pattern of human behavior dictates that while it might be hard to teach these skills to a computer you can put this off a long ways down the road until after basic speech is achieved.

If somebody paid me the money to do this research my personal plan of attack would be:

1. Focus on the parser first. Start with a text parser and do audio to text later. Don't worry about defining anything at this stage. When humans first learn to talk and listen they are focusing upon the words and absolutely not what those words mean.

The parser should not be parsing words. Parsing words from text is easy. The parser should be parsing sentences into grammars, which is harder but still generally straight forward with many edge cases.

2. Vocabulary. Attempt to define words comprising the parsed grammar. Keep it simple. Don't worry about precision at first. Humans don't start with precision and humans get speech wrong all the time. This especially true for pronouns. Just provide a definition.

3. Put the vocabulary together with the parsed grammar. It doesn't even have to make sense. It just has to have meaning for words and the words together in a way that informs an opinion or decision to the computer. Consider this sentence as an example: I work for a company high up in the building with a new hire that just got high and gets paid higher than my high school sweetheart.

4. If the sentence is part of a paragraph or a response to a conversation you can now focus on precision. You have additional references from which to draw upon. You are going to redefine some terms, particularly pronouns. Using the added sentences make a decision as to whether new definitions apply more directly than the original definitions. This is how humans do it. These repeated processing steps means wasted CPU cycles and its tiring for humans too.

5. Formulate a response. This could be a resolution to close the conversation, or it could be a question asking for additional information or clarity. Humans do this too.

6. Only based upon the final resolution determine what you have learned. Use this knowledge to make decisions to modify parsing rules and amend vocabulary definitions. The logic involved is called heuristics.

This only way all this works is to start small, like a toddler, and expand it until the responses become more precise, faster, and more fluid. At least.... this is how I would do it.

2 comments

It depends a bit on what you are trying to achieve but I think hooking neural type networks together to simulate human mental faculties might be a better way forward. For instance much of human thinking seems to work around visualizing things in 3d space so you can say to someone imagine a dog on a skateboard on top of a hill and you give it a push, what happens? Once you've got that kind of stuff working with spatial awareness, cause and effect and so on using neural type processing I think the language understanding would come fairly naturally.
You have some good points, but this naive approach of handcoding "cognitive modules" was tried many times in 20th century, and it didn't work at all.

But look at what Deepmind does: it takes these ideas (and also ideas from systems neuroscience), implements them as differentiable modules and trains them on data in end-to-end fashion. This works really well.

Learning is very important, much more important than architecture. If you have a model that can learn you can add more structure later - again this is what modern deep learning is all about.