| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by microtonal 3691 days ago

Curious - The parsing work I've done with programming languages was never done via machine learning,

Artificial languages (such a programming languages) are usually designed to be unambiguous. In other words, there is a 1:1 mapping from a sentence or fragment to its abstract representation.

Natural language is ambiguous, so there is usually 1:N mapping from a sentence to abstract representations. So, at some point you need to decide which of the N readings is the most likely one.

Older rule-based approaches typically constructed all readings of a sentence and used a model to estimate which reading is the most plausible. In newer deterministic, linear-time (transition-based) parsers, such ambiguities (if any) are resolved immediately during each parsing step.

In the end it's a trade-off between having access to global information during disambiguation and having a higher complexity. So, naturally, the rule-based systems have been applying tricks to aggressively prune the search space, while transition-based parsers are gaining more and more tricks to incorporate more global information.