| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wodenokoto 1870 days ago

The article goes as far as to say that a parser is not the right tool.

> Not only can the task be solved with a regular expression - regular expressions are basically the only practical way to solve the problem. Which is why none of the clever answers actually suggest another way to solve the problem.

So no, the author is not missing the point at all.

2 comments

goto11 1869 days ago

The point is that a parser could very well use regexes under the hood to perform the tokenization. Because it is the right tool for the job. A language without regex-support might use something like lex to compile a lexer. Of course you can write a character-by-character lexer by hand, but this is just equivalent to what a regex would generate.

So saying "this is not possible, use a parser instead" is completely misunderstanding the relationship between lexing and parsing. I wonder how these people think a parser works?

link

IshKebab 1869 days ago

I mean that bit is clearly wrong. An XML/HTML parser is a perfectly practical way to solve the problem.

However I completely agree that they didn't miss the point. A regex to do this might be fine for hacky things that you don't need to be robust (e.g. for searching for stuff, measuring stats, one-off scripts etc.).

link

goto11 1869 days ago

Regular expressions can be as robust as you need them to be, just like any other kind of code. They are a DSL to create lexers, and they are exactly as robust (or hacky) as if you wrote the same lexer by hand.

link

IshKebab 1869 days ago

C code can be as robust as you need it to be. So why bother with formal verification, safe C coding standards, Rust, etc?

The answer is that it can be robust, but the effort required to do that is so large that in practice it usually isn't.

link

goto11 1869 days ago

Are you arguing that the effort required to make a regex robust and correct is larger than the effort required to make some hand-rolled character-by-character based lexer robust and correct?

Because that sounds counter-intuitive to me. A regex is a higher level DSL for lexing.

link

IshKebab 1868 days ago

That's exactly what I'm arguing. Especially because it's very unlikely that you'd write an XML/HTML parser yourself instead of using somebody else's well-tested library.

link

goto11 1868 days ago

OK but these are two separate question.

Of course you should use an existing library if it solves the exact problem you have. Don't waste time re-implementing the wheel unless you are doing if for educational purposes. Whether such a library used regexes or not under the hood would be irrelevant as long as it works and it well tested.

But I would certainly like to hear an argument why you think a regex is less robust that a similar manual character-by-character matcher.

link