Hacker News new | ask | show | jobs
by marktangotango 4141 days ago
Generally, I find that if one's regexes are so complex that one needs visualizers or other aids in writing them, one doesn't have a regex problem, but a parsing problem. The method of parsing by recursive descent can often lead to much more understandable (if more verbose) "pattern matching".
2 comments

The worst regexes I've had to write involved parsing the various IMDB data files, which seem to have been formatted specifically to make them as difficult to parse as possible. I hear mediawiki syntax is similarly arcane and evil, but I've never tried to parse it (though last night I started writing some tools to deal with wikipedia dumps so I might end up in that corner). I'd really like to see different approaches to parsing really ugly formats that feature an exception to almost every single pattern you think you've found. I honestly think the regex is easiest...
Recursive descend is imperative, while regex is declarative.

Regex may be ugly, but you lose something important when you move from declarative to imperative.

"Recursive descent" has that name precisely because it is not the only parsing alternative, hence we can not simply call it "parsing".