Hacker News new | ask | show | jobs
by tgv 574 days ago
Regular has a meaning, and it isn't this. It's had this meaning since the 1950s. OTOH, these expressions do not generate a regular language. That's a good reason to me.

They call it "semantic regular expression" because it apparently already is a lost cause. "Regular expressions with TMs embedded" doesn't quite have the same ring to it. Nobody would see it as a regexp.

2 comments

> OTOH, these expressions do not generate a regular language.

Okay sure you're technically correct here, but only because these expressions generate a subset of a regular language. The LLM can only be invoked on a substring that can expressed as a regular expression, and then it's only used to remove strings from the language. Their results are based heavily on how regular expressions work. A "semantic context-free grammar" would have different type of characteristics and behavior.

Maybe throwing in the word "extended" or "augmented" would be a bit more clear, but as I reader I definitely would expect "regular expression" to be part of the name.

Removing strings from the language is what makes it non-regular. E.g., a regular language cannot contain a^n b^n (that is: the string is only accepted when it has an identical amount of a's and b's), but it sure as hell can contain a^m b^n. Removing the strings where m != n is what makes a language context-free.
To be fair, I could see this basically allowing a form of "back reference" lookup that lets you offload the back reference to other parts. For example, `/(.); \1 = (.)/.exec("foo; foo = anything")`, but instead of doing a back reference, you could have an oracle lookup.

I haven't looked at the examples in this paper, yet. But I'm having fun imagining ways this could be used.