| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Darmani 1367 days ago

Okay, we seem to be whittling down on the points where we disagree.

Initial disagreement:

* langcc/XLR is a major breakthrough in the ability to generate parsers for industrial languages

---- Positions:

----------- Joe: Yes

----------- Me: Unclear. Several other works in the past 20 years have similar claims. Benchmark selected by langcc is of unclear significance. langcc paper is lacking indicia of reliability to investigate its linear time claim.

Current crux:

* Linear time guarantee is a bright-line barrier between parser generators suited for tool or compiler development on industrial languages and parser generators not so suited.

------ Positions:

---------- Joe: Yes

---------- Me: No

Is this a fair summary of the discussion so far?

I can say that this is a crux for me because, if you convince me that the linear-time guarantee is this important and that langcc has it, I will be much more likely to crow about / evangelize langcc in the future.

> My comments were referring to the compilation of tree automata to CFGs, which seems to be necessary in order to use them with standard LR parsing techniques. The alternative seems to be to use the automata to filter sets of possible parses after-the-fact, which does not by itself lead to an efficient parsing algorithm.

Cool! So, the distinguishment is more about its use of CFGs as a compilation target than about the main meat of the work. I find this a much clearer description of how you intend to compare Michael Adams's paper than the one I quoted.

-----------------------------------------------------

For the rest of your comment, a few claims:

* "Golang is a representative example"

* "In order to be competitive with hand-written parser [...] a generated parser should at least be guaranteed linear-time, and should be comparable in terms of concrete efficiency benchmarks. "

* "the community had yet to propose an alternative that is general and efficient enough (esp., linear-time) to replace hand-written recursive-descent parsers for industrial languages such as Golang"

This is an interesting set of claims because all the arguments I've heard in the past in favor of handwritten parsers were about error messages, not perf. I know langcc has an idea for that as well, but you seem to be focusing on performance as the real breakthrough.

So, question:

Would you accept a generated parser for Ruby, C++, Haskell, Delphi, or Verilog that is empirically within 3x the parsing speed of the handwritten alternative as sufficient evidence that some existing work is already competitive on this front?

If not, why?

1 comments

jzimmerman64 1367 days ago

> I can say that this is a crux for me because, if you convince me that the linear-time guarantee is this important and that langcc has it

The fact that langcc has the linear-time guarantee follows from the fact that it generates shift/reduce parsers that take steps according to an LR DFA. If you are not already convinced that this property is important, I am not sure that I can convince you, except to ask whether you would be comfortable running, e.g., a worst-case-O(n^3) parser for a real industrial language in production. (Or, for that matter, running a parser such as GLR which may fail with an ambiguity at parse time. langcc parsers, of course, will report potential ambiguity as an LR conflict at _parser generation time_; once langcc produces the LR DFA, its behavior at parse time is unambiguous.)

There are also many other important theoretical innovations in the langcc paper, e.g., the LR NFA optimization procedures, the conflict tracing algorithm, and XLR.

> This is an interesting set of claims because all the arguments I've heard in the past in favor of handwritten parsers were about error messages, not perf. I know langcc has an idea for that as well, but you seem to be focusing on performance as the real breakthrough.

Yes, langcc also features a novel LR conflict tracing algorithm, which makes conflicts very easy to debug at parser generation time. I am not sure if I would call performance the "breakthrough"; more so that linear-time performance is a _prerequisite_, and the theoretical breakthroughs of the langcc paper are what allow us to satisfy that prerequisite.

> Would you accept a generated parser for Ruby, C++, Haskell, Delphi, or Verilog that is empirically within 3x the parsing speed of the handwritten alternative as sufficient evidence that some existing work is already competitive on this front?

I have not studied Ruby, Delphi, or Verilog extensively, so could not comment on those languages at the moment. I know that Golang 1.17.8 is sufficiently difficult to parse as to be a good benchmark. I am not sure whether Haskell is sufficiently difficult to parse, but would at least be _interested_ in results for it. C++ parsing is undecidable, so that would seem to rule it out as a candidate for a fully automatically generated parser.

Darmani 1366 days ago

> If you are not already convinced that this property is important, I am not sure that I can convince you, except to ask whether you would be comfortable running, e.g., a worst-case-O(n^3) parser for a real industrial language in production. (Or, for that matter, running a parser such as GLR which may fail with an ambiguity at parse time.

Looks like it stops here. Yes, I am comfortable running such algorithms in production. Github code-nav uses GLR parsing (via tree-sitter) on a large fraction of the code in existence.