Hacker News new | ask | show | jobs
by cf499 1815 days ago
> The author makes no comment on how hideously bad it is, which makes me suspect he didn’t notice, which… yeah, shows the problems of the whole thing.

The author's comments on that "reverse" function are equally bad - https://youtu.be/9Pw-Roo_duE?t=171

It's described as "efficient" but it calls `len` on an unchanging list in 3 places.

2 comments

That’s actually roughly len(arr) times, not three, since two of the calls are inside the loop.

But the far bigger red flag there is that that it doesn’t just use arr.reverse(), which does the same thing and is typically 8–10× as fast in some simple testing (assuming a list), or arr[::-1], which makes a shallow copy rather than modifying the object in-place.

This matches what I’ve been seeing in code examples: Copilot likes to implement things from scratch rather than using libraries or even standard library functionality.

It’s possible that the word “array” tripped it up here and that it would have done something saner had it been told “list”, but I doubt it. (Python’s built-in array module is very seldom used; if you talk of arrays, you’re probably dealing with something like numpy’s arrays instead. But it’s far more likely that the built-in list type was what was desired here.)

There’s also one other significant point of bad and dangerous style in the code generated: the reverse function mutates its argument and returns it. Outside of fluent APIs (an uncommon pattern in Python, and not in use here), this is generally considered a bad idea in most languages, Python certainly included. It should either mutate its argument and return None, or not mutate its argument and return a new list.

> That’s actually roughly len(arr) times, not three, since two of the calls are inside the loop.

You're right! I noticed it a few minutes ago and changed the wording accordingly. Thanks for pointing out how shockingly bad that algorithm actually is! :D

> (...) the reverse function mutates its argument and returns it.

Yeah, that mutate + return is confusing. It's also worth noting that, as a result of the mutation, the function doesn't work on immutable types like strings and tuples.

It's not shockingly bad to call len(arr) each time through the loop. It would be if we were talking about something like strlen() in C, but in Python it's just one more constant-time† operation each time through the loop, and not a very expensive one at that. Caching it in a local variable would still be better.

______

† Nothing is really constant-time in CPython, but it's pretty close.

Well, to be fair, the `len` operation on lists in Python is a constant time operation. What makes the example particularly bad is using a cast on the result of a floating-point division, rather than just using Python 3 integer division (i.e. the `//` operator). Copilot was clearly just spitting out Python 2 code here.
I think the difference is that redundant re-calculation is a code smell in any language, at all times, whereas the float/int division issue requires knowledge of Python 2/3 syntax quirks.
It's context-specific whether recalculation is a code smell. I've definitely gotten feedback from senior developers to just do len(x) repeatedly instead of "littering" (their words) the code with `xCount=len(x)` assignments (because they knew len was constant-time).
Sure, it makes sense if you find something like `len_x` or `x_count` significantly less readable than `len(x)` and if you don't have to worry about resources. Can't say I've been there though.

I did a quick test of this `reverse` function (which probably shouldn't exist in the first place) and, unsurprisingly, it became ~30% faster when `len(arr)` was only called once.

When I'm that resource constrained, I don't use Python. The changes necessary to make regular Python code performant defeat the goal of making it readable.

There's a reason NumPy's innards aren't Python code.

Both sides of this "readability vs. performance" debate can be argued ad absurdum but that's not my intention. All I know is that I try to conserve resources no matter what level of the stack I'm working on and those generated snippets certainly don't!