Hacker News new | ask | show | jobs
by jstarfish 3024 days ago
They're definitely useful, and I can cobble them together to get lots of otherwise tedious and complex parsing tasks done, but when I come back to them a week later I have no idea what the hell the pile of wingding vomit I wrote was supposed to do.

I find myself writing simpler ones and tying them together with app code just for sanity's sake.

3 comments

Some regex implementations allow for comments in the string; if your does not, you can probably make it work with concatenation, like:

  String pattern = "^https+" // match the protocol at the beginning
                 + "([a-zA-Z])+" // match the machine name
                 + ...
Honestly, I use regular expressions because, even in such format expanded with comments, I haven't seen anything more readable after you get used to regex operators. I guess the closest would be the alternative format in CL-PPCRE. For instance:

  CL-USER> (cl-ppcre:parse-string "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b")
  (:SEQUENCE :WORD-BOUNDARY (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) :WORD-BOUNDARY)
But then, any such form can get mouthful:

  CL-USER> (cl-ppcre:parse-string "((\\b[0-9]+)?\\.)?\\b[0-9]+([eE][-+]?[0-9]+)?\\b")
  (:SEQUENCE
   (:GREEDY-REPETITION 0 1
    (:REGISTER
     (:SEQUENCE
      (:GREEDY-REPETITION 0 1
       (:REGISTER
        (:SEQUENCE :WORD-BOUNDARY
         (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))))
      #\.)))
   :WORD-BOUNDARY
   (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))
   (:GREEDY-REPETITION 0 1
    (:REGISTER
     (:SEQUENCE (:CHAR-CLASS #\e #\E)
      (:GREEDY-REPETITION 0 1 (:CHAR-CLASS #\- #\+))
      (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))))
   :WORD-BOUNDARY)
If you're using PCRE you should also make use of named patterns. It makes the expression easier to understand as you can reuse parts of it (a little like functions) and the matched patterns can be then used in your language with their name instead of their position. Decoupling the usage from the regexp so it is more robust.

http://www.rexegg.com/regex-capture.html#namedgroups

That would be nice to have when working in Javascript, making code a lot more readable and easy to update.

Alas, we don't get such luxuries as named groups or static typing... Woe is me.

I wonder if Perl 6 regexes and grammars might show a way forward for more readable pattern matching.
An example would be helpful so here is one https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...
> I find myself writing simpler ones and tying them together with app code just for sanity's sake.

Or just use PEG, parser combinators, or other more readable parsing abstractions