Hacker News new | ask | show | jobs
by chris-at 4139 days ago
Thanks, this is a lot better than writing this (even if the formatting worked here):

``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels )+ (?: # End with: \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) ) ```

3 comments

actually most of the comments seem to imply that whoever wrote that don't fully understand regexp syntax -- or, worst, she expects that whoever read will not

    /{1,3}                        # 1-3 slashes
    |                             #   or
    [a-z0-9%]                     # Single letter or digit or "%";
err... sorry?

https://www.debuggex.com/r/EpocMU_7Fq_B_p9z

edit:

wait, I thought about it for a second and I see what you meant. You're not saying it's wrong, you're saying it's obvious.

I wasn't sure if it was obvious because I wasn't sure if {1,3} was supposed to be {1-3} and there was a mistake in the expression, or if there was some kind of unexpected error in the [a-z0-9%] expression.

Because even in this simple example, there is room for error.

Properly formatted (to be fair this is from a blog post explaining how the regex works: http://daringfireball.net/2010/07/improved_regex_for_matchin...):

    (?xi)
    \b
    (                           # Capture 1: entire matched URL
      (?:
        [a-z][\w-]+:                # URL protocol and colon
        (?:
          /{1,3}                        # 1-3 slashes
          |                             #   or
          [a-z0-9%]                     # Single letter or digit or '%'
                                        # (Trying not to match e.g. "URI::Escape")
        )
        |                           #   or
        www\d{0,3}[.]               # "www.", "www1.", "www2." … "www999."
        |                           #   or
        [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
      )
      (?:                           # One or more:
        [^\s()<>]+                      # Run of non-space, non-()<>
        |                               #   or
        \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
      )+
      (?:                           # End with:
        \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
        |                                   #   or
        [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
      )
    )
cf the Perl 6 community module for parsing URIs which features Perl 6's unique unification of regexes and grammars:

https://github.com/perl6-community-modules/uri/blob/master/l...

HN doesn't support Markdown. You'll need to prefix each line with >= 2 spaces for it to be treated as code.

https://news.ycombinator.com/formatdoc

That really is Hacker News' worst limitation. I understand if they want to limit what formatting is available, but the fact that basic listing is so clunky is annoying.