|
|
|
|
|
by erezsh
2381 days ago
|
|
That sounds really cool, and I hope it will find success. The field of parsing truly needs some refreshment, especially in usability. I do have a few questions: 1. You claim the algorithm used guarantees O(n^3) for all CFGs, but is different from Earley and CYK. Are there more details for this algorithm, perhaps with a rigorous mathematical proof for this claim? 2. How is Glush's performance in relation to Earley, in terms of constant time and memory requirements? (even for low complexity, a high constant can be problematic) 3. How do you deal with ambiguity? The blog mentions priority, but is there a way to get something similar to a SPPF? |
|
> 1. You claim the algorithm used guarantees O(n^3) for all CFGs, but is different from Earley and CYK. Are there more details for this algorithm, perhaps with a rigorous mathematical proof for this claim?
Not quite yet. There is a lot more work needed and writing this article was one step in right direction. I was hoping I was able to get even closer to
> 2. How is Glush's performance in relation to Earley, in terms of constant time and memory requirements? (even for low complexity, a high constant can be problematic)
This is yet to be determined. A few comments:
- I'm not 100% concerned about performance because as mentioned in the article, I don't plan to use this in a place where performance is critical.
- I'm not too worried about the performance since I'm not doing too complicated data structures (it's all hash maps), and most intermediate data structures are small in size and are probably best implemented linearly (e.g. plain array) for practical grammars with few ambiguities.
- It's based on NFA-style "advance one state at a time" which actually is not good for performance. This means that parsing a keyword such as "function" actually passed through 8 different states instead of just looking ahead and jumping straight ahead. I think taking advantage of these look ahead will probably help a lot
> 3. How do you deal with ambiguity? The blog mentions priority, but is there a way to get something similar to a SPPF?
A single output from the parser is conceptually a flat list of _marks_ (think like a reverse RPN):
The total output from the parser is a "tree of marks".