Hacker News new | ask | show | jobs
by lifthrasiir 2108 days ago
string.gsub receives at most three arguments where the final optional argument is the maximum number of replacements, and returns two values where the second value is the number of replacements made. Therefore if the arguments do not have escape sequences the outer string.gsub receives 0 and `<arguments>` doesn't get replaced.

I intentionally asked you to find this bug out because you can acknowledge the particular class of bugs only after biten by that bug, and you didn't seem to even know what might be problematic. In the other words, by now you can't get away by saying "you should give all function calls a name" posthumously.

It's a frequent novice mistake to write `if (a = b)` in C/C++, but any good enough C/C++ programmer will point it out (and modern compilers will flag a warning). Eventually people get used with this problem and the cycle repeats, this time with less novices falling into the trap, so this class of bugs is---while problematic---not considered a huge deal. Given your reaction though, I doubt this is the case for Lua and that's yet, yet another reason to avoid Lua.

1 comments

Edit: ah, I just realized what you're actually complaining about.

Being able to drop two arguments into a function call is great! It's one of the better things about Lua.

But when you don't want it, wrap the call:

    f(a(b,c)) -- this will call f(r1,r2) with two returns
    -- vs
    f((a(b,c)) -- this will always call f(r1)
And now you know how to prevent this class of bug. Catching it in an intermediate variable is possible but by no means necessary.

Original reply follows:

Alright, yeah, that kinda sucks.

Here's another dumb one: table.insert, if you give it two values, it's table.insert(tab, val) and it inserts at #tab + 1.

If you give it three values, it's table.insert(tab, index, val), and that's bad enough: but if you pass it (tab, val, nil), it interprets val as index.

Which is surprising! you'd think f(a,b) and f(a,b,nil) would be the same thing, and usually, they are. But not in this one case.

But to be fair, here's an example of Lua done well:

    local L = require "lpeg"
    
    local end_str_P = L.P "]" * L.P "="^0 * L.P "]"
    
    local function _disp(first, last)
       return last - first - 2
    end
    
    -- capture an array containing the number of equals signs in each matching
    -- long-string close, e.g. "aa]]bbb]=]ccc]==]" returns {0, 1, 2}
    
    local find_str = L.Ct(((-end_str_P * 1)^0
                          * (L.Cp() * end_str_P * L.Cp()) / _disp)^1)
I'll let you reason about what it does, and why it would be such a pain in the butt with regular expressions.
> But when you don't want it, wrap the call: [...] And now you know how to prevent this class of bug. Catching it in an intermediate variable is possible but by no means necessary.

Well, I knew that; in fact what I've linked in the past discussion was the exact code to handle this case in my type checker. It is seriously confusing that a seemingly excess parenthesis affects the code anyway, and you can't exactly know whether the parenthesis is necessary or not without using a type checker.

There are some other languages that have multiple returns and can pass multiple arguments for nested calls, notably Go, but Go only does that when the inner call is the sole argument to the outer call. And Go does have a type system. Tons of other languages including JavaScript have an explicit "spread" operator which mostly retains the usability and avoids such problems.

> Which is surprising! you'd think f(a,b) and f(a,b,nil) would be the same thing, and usually, they are. But not in this one case.

Or any C function relying on lua_gettop. Seriously, nil should have eliminated in that stack.

> I'll let you reason about what it does, and why it would be such a pain in the butt with regular expressions.

I don't like a quirky EDSL alternative to the regular expression (or what it matters, parsing expression grammars) either. I prefer lpeg.re in that regard.

    local re = require "re"
    local long_str = re.compile(
        [[
            end_str <- ']' '='* ']'
            long_str <- {| ((!end_str .)+ ({} end_str {}) -> disp)+ |}
        ]],
        {
            disp = function (first, last)
                return last - first - 2
            end,
        })
Lpeg itself is a fine library, but its use of 1 as anything or -1 as the end of string is annoying (at the very least `lpeg.P(-1)` should have been `lpeg.Eos` instead). I would still use lpeg proper for constructing patterns programatically.

But you can't give lpeg as an answer to string.gsub and so on. First, the standard library still remains problematic and people will get tripped up (if you have no room for implementing `|`, one can require that `|` is escaped, m'kay?). Second, it's not a default, or at least not what you can easily `require`. The Lua ecosystem is fragmented primarily by not having a universally usable package system (I stress that LuaRocks is not) thus an external library is not always an option.

By the way I don't know why you would want to give that obscure example to brag about Lua. Your code is in fact slightly incorrect: `(L.Cp() ...) / _disp` looks like that the capture is applied only to that parenthesis but it's not, thanks to the operator precedence. The main problem of regular expressions is an inability to refactor and you haven't correctly demonstrated that.

> Well, I knew that

So a deficiency in your code, you chose to blame the language.

That's a you problem.

> I don't like a quirky EDSL

A you problem

> I prefer lpeg.re

So Lua offers two superior ways to process strings over one? and you're still mad? Sounds like a

> But you can't give lpeg as an answer

Ah but I did. You just don't like it, which is, you guessed it, a you problem.

Ierusalimchy wrote lpeg specifically to address known deficiencies in the pattern library.

Sure, you can't always use it. Just like you might be stuck on an obsolete version of JS in an embedded system.

But this is neither Lua's problem, nor my problem. It might be your problem. I'm beginning to detect a pattern!

> Your code is in fact slightly incorrect

It is, in fact, not.

I recognize the trope here. Happily, I'm in a position to fire people who show it, and have.

In your case, I'll have to be content with never interacting with you again.

(I'm pretty sure that this won't get a reply, but I'll leave the final post for others.)

> So a deficiency in your code, you chose to blame the language.

The language is to blame if it encourages a deficiency. I've linked to the type checker code to show that this kind of bug is virtually invisible, not like the aforementioned `if (a = b)` case, even to who knew enough to write a type checker.

> So Lua offers two superior ways to process strings over one? and you're still mad?

Lua doesn't offer two superior ways, it's lpeg that offers. And what lpeg offers (first-class PEG) is irrelevant to my problem (Lua the language lets you to accidentally put unintended arguments) anyway.

> Ierusalimchy wrote lpeg specifically to address known deficiencies in the pattern library.

And yet he left the faulty version in the standard library.

> Sure, you can't always use it. Just like you might be stuck on an obsolete version of JS in an embedded system.

You don't need the embedded system to show that you can't always use the JS package system. Web browsers some years ago didn't support that. As a result webpack (among others) happened and we are now comfortable to compile JS down to the bundle. Does Lua have any equivalent?

When I raise the issue with Lua the language you point to Lua the ecosystem and sideskip the fact that Lua the ecosystem itself is fragmented thanks to Lua the implementation. My issue with Lua (either the language, the implementation and the ecosystem) is multitude and pretty much every issue is connected to each other.

> It is, in fact, not.

Your code is different to the following (which is what I consider more readable):

    -- [end_str_P and _disp omitted]

    -- returns 0 for [[, 1 for [=[, 2 for [==[, and so on
    local end_str_len_P = (L.Cp() * end_str_P * L.Cp()) / _disp

    local find_str = L.Ct(((-end_str_P * 1)^0 * end_str_len_P)^1)
Your code unintentionally groupped `(-end_str_P * 1)^0 * (L.Cp() * end_str_P * L.Cp())`, as `a * b / c` would be `(a * b) / c` instead of `a * (b / c)`. It is not "incorrect" as the former doesn't contain any capture, but has enough potential to become incorrect. That's what I want to avoid by not using a "quirky EDSL".