Hacker News new | ask | show | jobs
by judofyr 574 days ago
> OTOH, these expressions do not generate a regular language.

Okay sure you're technically correct here, but only because these expressions generate a subset of a regular language. The LLM can only be invoked on a substring that can expressed as a regular expression, and then it's only used to remove strings from the language. Their results are based heavily on how regular expressions work. A "semantic context-free grammar" would have different type of characteristics and behavior.

Maybe throwing in the word "extended" or "augmented" would be a bit more clear, but as I reader I definitely would expect "regular expression" to be part of the name.

1 comments

Removing strings from the language is what makes it non-regular. E.g., a regular language cannot contain a^n b^n (that is: the string is only accepted when it has an identical amount of a's and b's), but it sure as hell can contain a^m b^n. Removing the strings where m != n is what makes a language context-free.