| HN Mirror

I think it's trying to be too magical. At this point it either seems to work, or something triggers it's pattern matching wrong and it's really hard to figure out what or why. I think giving back a little of the simpleness in favor of more control is worthwhile. For example, if the example portions that were formatting were differentiated from the data matching, it's not too complicated but intent is much clearer.

For example, if the rules were: example content must be contained within braces, and any braces within the example content need to be escaped, it's clear. At that point, your example becomes:

  {example1.org}, {path/index.html}

It would still probably just return "wwwexample.4, g/hijklmnop" for the last example though, because it's ambiguous as to whether you want just the end of the url, or the whole thing. Allowing regex markup for more explicit matching would make it clearer still, but your example still causes problems until you go all the way to positive lookbehind assertions. At that point I need to learn all that, I might as well just use perl:

  # perl -pe 's{.*https?://([^/]+)(/\S*).*}{$1, $2}' /tmp/foo
  example1.org, /path/index.html
  www.example2.org, /path/index.html
  www.example3.org, /
  www.example4.org, /a/b/c/d/e/f/g/hijklmnop