Hacker News new | ask | show | jobs
by judofyr 2379 days ago
Author of the article here. Thanks for your comments!

> 1. You claim the algorithm used guarantees O(n^3) for all CFGs, but is different from Earley and CYK. Are there more details for this algorithm, perhaps with a rigorous mathematical proof for this claim?

Not quite yet. There is a lot more work needed and writing this article was one step in right direction. I was hoping I was able to get even closer to

> 2. How is Glush's performance in relation to Earley, in terms of constant time and memory requirements? (even for low complexity, a high constant can be problematic)

This is yet to be determined. A few comments:

- I'm not 100% concerned about performance because as mentioned in the article, I don't plan to use this in a place where performance is critical.

- I'm not too worried about the performance since I'm not doing too complicated data structures (it's all hash maps), and most intermediate data structures are small in size and are probably best implemented linearly (e.g. plain array) for practical grammars with few ambiguities.

- It's based on NFA-style "advance one state at a time" which actually is not good for performance. This means that parsing a keyword such as "function" actually passed through 8 different states instead of just looking ahead and jumping straight ahead. I think taking advantage of these look ahead will probably help a lot

> 3. How do you deal with ambiguity? The blog mentions priority, but is there a way to get something similar to a SPPF?

A single output from the parser is conceptually a flat list of _marks_ (think like a reverse RPN):

    // this input:
    1 + 2 * 3 + 4
    // with this precedence:
    (1 + (2 * 3)) + 4
    // could generate this flat list of marks:
    add, add, 1, mul, 2, 3, 4
The total output from the parser is a "tree of marks".