| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wyager 3598 days ago
	I mean, the whole point of regexes (not to be confused with PCREs) is that any given regex is isomorphic to some canonical finite state machine. It is, specifically speaking, a tiny description of an FSM over the alphabet of ASCII characters (or whatever charset you're using). Interestingly, regexes/FSMs are (IIRC) the most powerful class of machines for which equivalence is decidable. So if you give me any two regexes, I can tell you if they match on all the same strings, but this is not true for any more powerful grammar.

3 comments

halter73 3598 days ago

Isn't it possible to decide the equivalence of deterministic pushdown automata? Wouldn't DPDAs be considered more powerful than FSMs due to the addition of a stack?

Wikipedia [1] shows there's a paper called "The equivalence problem for deterministic pushdown automata is decidable" that won the Gödel Prize is 2002. I haven't read the paper nor do I currently have access to it though.

[1] https://en.wikipedia.org/wiki/Deterministic_pushdown_automat...

link

kefka 3598 days ago

Here you go!

http://link.springer.com.sci-hub.cc/chapter/10.1007/3-540-63...

It was initially published in 1997.

link

wyager 3598 days ago

Huh! Well there you go.

link

adrianratnapala 3598 days ago

Presumably that means "this is not true for any more powerful class of grammars".

Any particular pair grammars from that class. I mean if you showed me a yacc grammar for Pascal and another one for Java, I am sure I could prove they were ineqivelent.

More tricky, if you gave someone clever a hand written Pascal parser and that same yacc grammar she could probably prove equivilency with some effort. But there is no guarantee that this is always possible.

link

wyager 3598 days ago

Yes, that is what I meant.

> I mean if you showed me a yacc grammar for Pascal and another one for Java,

This is why I said "any", and not "all". In most practical cases, you can tell if two grammars are different, but it's easy to construct cases where you can't.

link

hvidgaard 3598 days ago

While it is true for Regular Languages (FSM) strict RegExs, it's hardly true for any contemporary implementation of RegExs. They have back references and other constructs that make them more powerful than a Regular Language, and thereby make it impossible to decide equivalence in the general case.

link

wyager 3598 days ago

Hence "not to be confused with PCREs", which most pseudo-regex implementations are based off of.

link

Hydraulix989 3598 days ago

I'll ask the "dumb question" then: What is a PCRE? How is it different than a "true" regex?

link

robewald 3598 days ago

PCREs are Perl Compatible Regular Expressions, they have backreferences and other constructs. Almost all languages that have Regular Expressions have similar constructs. "Computer Science" Regular Expressions are more limited and implement a "regular language" as defined in https://en.wikipedia.org/wiki/Chomsky_hierarchy. This gives some guarantees, which go out the window with PCREs.

link

hvidgaard 3598 days ago

Did you edit the comment? I cannot recall reading that at all, but I may have been in a hurry.

link

wyager 3597 days ago

Nope, didn't change it!

link