Hacker News new | ask | show | jobs
by thurston 5504 days ago
To have both generalized parsing and context dependent parsing you need to somehow revert changes to the global state when you backtrack. You can do this by restoring it to old versions, which requires copying the state and keeping the history. The Colm approach is to keep only one global state, but store instructions for programmatically reverting the state while you backtrack.
2 comments

The point of functional data structures is that if you modify, you don't copy the whole structure, you only copy the path to the things that changed.

A simple example would be parsing a sequence of characters and storing that in a linked list. When you get an additional character you don't destructively modify the list, instead you create a new list node that points to the existing list.

    def process(char, state):
      return Node(char, state)
The process function takes a character and a state, and returns a new state. The old state is still usable, as nothing was mutated. On the other hand no large data structure had to be copied.

Your method is probably more efficient. Because you don't need access to all historical versions simultaneously, you only have to keep a stack of undo operations. So you only have O(1) slowdown per operation. With functional data structures you have in the worst case a O(log n) slowdown per operation.

For example if you implement a bit vector, then update and lookup can both be O(log n), e.g. a red-black tree. If you want update to be O(1) then lookup becomes O(n), e.g. a linked list. If you want lookup to be O(1) then update becomes O(n), e.g. copying the entire vector on update.

(please correct me if I'm wrong and if you can do better than O(n) for those cases)

Yes I understand these techniques, in fact they are heavily in use in Colm, just not for maintaining the global state that is used for the parsing feedback loop.

I use the term "copy" in the general sense to make discussion easier. Conceptually, persistent data structures are a copy, even if they are optimized heavily to incur very little cost.

Then you misunderstood my point, backtracking and keeping history with persistent data structures is very cheap.