|
|
|
|
|
by lefttoreader
1040 days ago
|
|
Sure! So it’s hopefully clear that the notion of constrained grammar is not novel (see every comment on here of people name-dropping their implementation from two months ago). The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step. The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc. Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in. Again - shill OSS all day - I’ll upvote it. |
|
I mean going from standard regex to NFA to DFA is already more sophisticated than that one, it's _quite_ oldschool and gives you linear time matching: https://en.wikipedia.org/wiki/Thompson%27s_construction https://en.wikipedia.org/wiki/Powerset_construction
And what I mean to say by this as they could have easily have had this idea and never had discovered the whitepaper you referenced.