Hacker News new | ask | show | jobs
by jbnicolai 2898 days ago
Agreed. The problem is the risk of small, relatively hard to spot & nearly impossible to properly debug mistakes.

> A{2,4} -> AA(A|AA)

2 comments

Well spotted.

I would argue that risk of mistakes should not limit the expressivity of a language or have it added to the pile of bad ideas. It is better for users of the language to be aware of potential pitfalls, and use the language appropriately.

See my reply to shafte for some elaboration.

And this is why we have the syntax sugar.
That introduces problems too. If you try to use sugar like '+' with an implementation that doesn't support it, you don't get any sort of error. Instead you get a different expression.

Unfortunately, there's an inherent tradeoff between encoding efficiency and error detection. Notice that with the VerbalExpressions it would be trivial to return a useful error message if the 'at_least_one' pattern did not exist.

Perl 6 regexes attempt improve upon this situation by making regexes more like a regular programming language. That is it errs on the side of error detection rather than encoding efficiency. (It also adds features that would be difficult to add to Perl 5/PCRE regex design)

For a start if it didn't support using `+`, then any attempt to use it would generate a compiler error because it is not alphanumeric. (regex is code in Perl 6)

All non-alphanumeric characters are presumed to be metasyntactic, and so must be escaped in some way to match literally. Arguably best way is to quote it like a string literal. (Uses the same domain specific sub-language that the main language uses for string literals)

    / "+" + /   # at least one + character

It really is a significant redesign.

    /A{2,4}/    # Perl 5/PCRE
    /A ** 2..4/ # Perl 6

    /A (?:BA){1,3}/x
    /A [BA] ** 1..3/ # Perl 6: direct translation
    /A ** 2..4 % B/  # Perl 6: 2 to 4 A's separated by B

    /A (?:BA){1,3} B?/x
    /A ** 2..4 %% B/   # Perl 6: %% allows trailing separator

    /\" [^"]* \"/x     # Perl 5/PCRE
    /\" <-["]>* \"/    # Perl 6: direct translation
    /「"」 ~ 「"」 <-["]>*/ # Perl 6: between two ", match anything else
                       # (can be used to generate better error messages)

    ---

    # Perl 5
    my $foo = qr/foo/;
    'abfoo' =~ /ab $foo/x;

    # Perl 6
    my $foo = /foo/;
    'abfoo' ~~ /ab <$foo>/;
    # or
    my token foo {foo}     # treat it as a lexical subroutine
    'abfoo' ~~ /ab <&foo>/;

    ---

    # Perl 5
    my $foo = 'foo';
    'abfoo' =~ /ab \Q $foo \E/x; # treat as string not regex
    # Perl 6
    my $foo = 'foo';
    'abfoo' ~~ /ab $foo/; # that is the default in Perl 6