Hacker News new | ask | show | jobs
by TeMPOraL 3019 days ago
Some regex implementations allow for comments in the string; if your does not, you can probably make it work with concatenation, like:

  String pattern = "^https+" // match the protocol at the beginning
                 + "([a-zA-Z])+" // match the machine name
                 + ...
Honestly, I use regular expressions because, even in such format expanded with comments, I haven't seen anything more readable after you get used to regex operators. I guess the closest would be the alternative format in CL-PPCRE. For instance:

  CL-USER> (cl-ppcre:parse-string "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b")
  (:SEQUENCE :WORD-BOUNDARY (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) #\.
   (:GREEDY-REPETITION 1 3 :DIGIT-CLASS) :WORD-BOUNDARY)
But then, any such form can get mouthful:

  CL-USER> (cl-ppcre:parse-string "((\\b[0-9]+)?\\.)?\\b[0-9]+([eE][-+]?[0-9]+)?\\b")
  (:SEQUENCE
   (:GREEDY-REPETITION 0 1
    (:REGISTER
     (:SEQUENCE
      (:GREEDY-REPETITION 0 1
       (:REGISTER
        (:SEQUENCE :WORD-BOUNDARY
         (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))))
      #\.)))
   :WORD-BOUNDARY
   (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))
   (:GREEDY-REPETITION 0 1
    (:REGISTER
     (:SEQUENCE (:CHAR-CLASS #\e #\E)
      (:GREEDY-REPETITION 0 1 (:CHAR-CLASS #\- #\+))
      (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))))
   :WORD-BOUNDARY)
1 comments

If you're using PCRE you should also make use of named patterns. It makes the expression easier to understand as you can reuse parts of it (a little like functions) and the matched patterns can be then used in your language with their name instead of their position. Decoupling the usage from the regexp so it is more robust.

http://www.rexegg.com/regex-capture.html#namedgroups

That would be nice to have when working in Javascript, making code a lot more readable and easy to update.

Alas, we don't get such luxuries as named groups or static typing... Woe is me.