Hacker News new | ask | show | jobs
by mcguire 3180 days ago
Oh, dear lord, the algorithm descriptions and pseudocode in research papers...

A while back, I played around with Earley parsing. The Earley parsing algorithm, the basic one, is pretty simple. In fact, it's almost recursive descent, if you didn't use recursion, but managed the current state of the parse manually.

Unfortunately, the original algorithm was described as dynamic programming, which was the hot new thing at the time. Then, there's a bug in it, it's hard to recover a parse tree from it, and it can be extended to be efficient on left-recursive (?) grammars.

Translating the algorithm out of the dynamic programming world isn't too hard. There's a reasonable description of fixing the bug. But recovering a parse tree (technically, a "Shared Packed Parse Forest", since Earley is context-free and to avoid exponential blow-up when producing multiple parse trees, the trees need to share structure) killed me. That paper's pseudocode was the worse spaghetti I've seen in a long time.

1 comments

I fully agree, having been looking into the same papers. It also doesn't help that each author start out with their own version of the basic implementation which often won't match the implementation the other authors use.

I think I finally managed to get the SPPF to get the right structure now, but it is way harder to penetrate compared to the complexity of the underlying idea.

Now I only have to figure out how to actually iterate over the different parse trees..