|
|
|
|
|
by mdip
3694 days ago
|
|
This looks fantastic. I've been fascinated with parsers ever since I got into programming in my teens (almost always centered around programming language parsing). Curious - The parsing work I've done with programming languages was never done via machine learning, just the usual strict classification rules (which are used to parse ... code written to a strict specification). I'm guessing source code could be fed as data to an engine like this as a training model but I'm not sure what the value would be. Does anyone more experienced/smarter than me have any insights on something like that? As a side-point: Parsy McParseface - Well done. They managed to lob a gag over at NERC (Boaty McBoatface) and let them know that the world won't end because a product has a goofy name. Every time Google does things like this they send an unconscious remind us that they're a company that's 'still just a bunch of people like our users'. They've always been good at marketing in a way that keeps that "touchy-feely" sense about them and they've taken a free opportunity to get attention for this product beyond just the small circle of programmers. As NERC found out, a lot of people paid attention when the winning name was Boaty McBoatface (among other, more obnoxous/less tasteful choices). A story about a new ship isn't going to hit the front page of any general news site normally and I always felt that NERC missed a prime opportunity to continue with that publicity and attention. It became a topic talked about by friends of mine who would otherwise have never paid attention to anything science related. It would have been comical, should the Boaty's mission turn up a major discovery, to hear 'serious newscasters' say the name of the ship in reference to the breakthrough. And it would have been refreshing to see that organization stick to the original name with a "Well, we tried, you spoke, it was a mistake to trust the pranksters on the web but we're not going to invoke the 'we get the final say' clause because that wasn't the spirit of the campaign. Our bad." |
|
Artificial languages (such a programming languages) are usually designed to be unambiguous. In other words, there is a 1:1 mapping from a sentence or fragment to its abstract representation.
Natural language is ambiguous, so there is usually 1:N mapping from a sentence to abstract representations. So, at some point you need to decide which of the N readings is the most likely one.
Older rule-based approaches typically constructed all readings of a sentence and used a model to estimate which reading is the most plausible. In newer deterministic, linear-time (transition-based) parsers, such ambiguities (if any) are resolved immediately during each parsing step.
In the end it's a trade-off between having access to global information during disambiguation and having a higher complexity. So, naturally, the rule-based systems have been applying tricks to aggressively prune the search space, while transition-based parsers are gaining more and more tricks to incorporate more global information.