Hacker News new | ask | show | jobs
by voidUpdate 3 days ago
> "In Python, any method __eq__ is expected to return bool, and if it doesn't, then we need to explicitly tell type-checkers to ignore the type error. This function in Polars can also return different types depending on the inputs, thus requiring overloads."

Why would you ever want a == b to not return a bool??

EDIT: Yes, I understand that you can do element-wise equality checks on numpy arrays now

7 comments

There are examples like ORM query builders (something like `User.id == user_id` should not return a boolean, but rather some inspectable query part), multi-value comparisons (e.g. numpy arrays and views which could also be used as masks for indexing)

In general, when you get your hands on operator overloading you get a bunch of various quirky applications for each. Some dunder methods have strict runtime-level rules (e.g. __hash__ or __len__), some don't

That's one of the things I truly didn't get from my (very limited) experience with SQLAlchemy. Why not just have a method like orm.eq(User.id, user_id)? Much more readable.
Elementwise equality! Given two dataframe columns or ndarrays, users often expect `==` to give out a column or ndarrays of bools (like `+`, ``, `*, `&`, and just about every other binary operator).
What I love about operator overloading is that now you can't use operators without looking at their definition, in which case.. you could have done numpy.equals(a, b) anyway.

Does a == b true, if all elements are the same? Does it return an array of booleans? It's anyone's guess!

What's fun is that it could return an array of false if all elements are different, and then that value is truthy.
The function call approach can be a lot less readable.

Consider using Shamir secret sharing to share a secret, D, among several people with two people required to recover the secret. D is a positive integer, such as a randomly generated 128 bit AES key you are using to encrypt your launch codes or credit card database.

For anyone not familiar with Shamir secret sharing what you do is pick a prime number, p, that is larger than D and another random positive integer A, that is less than p. Then give each person a pair of numbers, (i, (Ai + D) % p), where each person gets a different i (which should be a positive integer less than p...it is OK to simply use 1, 2, 3, ...). Let's let Di = (Ai + D) % p.

(This is for the case where you want any two people to be able to launch your missiles or decrypt your database. If you wanted 3 required instead of giving out (i, (Ai + D) % p) you would give out (i, (Bi^2 + Ai + D) % p) where B is a randomly chosen positive integer less than p. For 4 required add on a Ci^3 term, and so on).

Given (i, Di) and (j, Dj) and p it is possible to recover A and D.

Here's what that looks like in a language where the big int library uses an accumulator style, i.e., operations are of the form X = X op Y, where the ops are methods on the big int objects. Assume Bi and Bj are big int objects initialized from i and j, and Di and Dj are already big into objects, as is p. This particular example is using Perl. (This is very old code. Since 2002 you can add a "use bigint" pragma to Perl code and then it would look a lot more like the second Python example below).

  my $A = $Dj->copy()->bsub($Di);  # Dj-Di
  $Di->bmul($Bj);                  # j*Di
  $Dj->bmul($Bi);                  # i*Dj
  $Di->bsub($Dj);                  # j*Di-i*Dj
  $Bj->bsub($Bi);                  # j-i
  $Bj->bmodinv($p);                # (j-i)'
  $Di->bmul($Bj);                  # (j*Di-i*Dj)*(j-i)'
  $Di->bmod($p);                   # (j*Di-i*Dj)*(j-i)'  mod p
  $A->bmul($Bj);                   # (Dj-Di)*(j-i)'
  $A->bmod($P);                    # (Dj-Di)*(j-i)'  mod p
At this point, the recovered A is in $A and the recovered D is in $Di

Here's what it looks like in a language with the ops as function calls taking the big int objects as arguments. This example is Python without using operator overloading.

  import operator as op
  def recover(i, j, Di, Dj, p):
    j_i_inv = pow(op.sub(j, i), -1, p)
    A = op.mod(op.mul(op.sub(Dj, Di), j_i_inv), p)
    D = op.mod(op.mul(op.sub(mul(j, Di), op.mul(i, Dj)), j_i_inv), p)
    return A, D
Probably more readable than accumulator style. Here it is in Python using its built-in operator overloading for big ints:

  def recover(i, j, Di, Dj, p):
    j_i_inv = pow(j-i, -1, p)
    A = ((Dj - Di) * j-i_inv ) % p
    D = ((j*Di - i*Dj) * j_i_inv) % p
    return A, D
I'd sure rather come across that than either of the earlier examples.

OT: this reminds me of something I started to do once but never finished. I was going to write for each language we used at work that had a big int library but that did not support operator overloading a class that implemented a big int RPN calculator. Java, for example. Then recover would look something like this:

    calc = new BigRPNCalc();
    calc.do(j, i, "-", p, "modinv dup");
    calc.do(Dj, Di, "- *", p, "mod swap");
    calc.do(j, Di, "*", i, Dj. "* - *", p, "mod");
    D = calc.pop();
    A = calc.pop();
But I never ended up needing big ints in any of those languages so never really got past some initial design work.
One example is if an and b are arrays (e.g. numpy arrays) it’s not unreasonable for dunder eq to return an array of booleans.

Another example might be if you have a domain specific representation of equality (e.g. class Equality)

I can see the first one making sense, but why would you need a representation of equality other than "yes, these are equal" and "no, these are not equal"?
The first use case that comes to mind is if you want a DSL to build expressions that are evaluated later in some different context e.g. when using `polars`:

```python df.filter( pl.col("foo") == pl.col("bar"), ) ```

Sqlalchemy does something equivalent too, and I'm sure there are many others.

Well personally I’m not a fan of turning everything into an object, but if you have properties or methods that exist upon the concept of Equality you might want to encode directly onto a class. Maybe in a domain where “Equality” is an important concept, like mathematics or even something like accounting.

Could enable a different interface into approximate equality for floating point numbers: Equality.approximate(iota: float) -> bool

IIRC, SQLAlchemy overloads this to return an object that represents an equality check in SQL. Because it was returning an object, it was always evaluating to True, because of another of Python’s footguns: truthiness/falsiness. This was a decade ago, and these particular footguns were not even remotely the biggest culprits in our bug backlogs (another honorable mention includes accidentally calling a sync function in an async context, causing timeouts in unrelated endpoints and leading to cascading system failure).
It could return a vector or a deferred expression? In polars, for example, operations on `pl.col` return `Expr` objects that are used to build queries, not immediately evaluated:

    df.filter(pl.col("status") == "active")
In numpy, `x == y` return a boolean vector of the same shape as x and y, comparing them element-wise.
Primarily, because Python doesn't have quasi-quoting. You can't pass an expression without workarounds like this.
I thought JavaScript language equality quirks was seen as problematic not a missing feature in Python.
At least in javascript, it tells you if things are equal or not. In python, apparently you could answer if A is equal to B with "beans" or 17 or ['a']
Never understood this complaint about operator overloading.

In any language, a function called `isEqual` could wipe your hard drive and replace your wallpaper with a photo of a penguin. Therefore, letting programmers pick the names of their functions is bad? No, obviously naming things for least surprise is the programmer's responsibility.

But when it's the symbols `==` instead of an ASCII name, it's a problem in language design?

(FWIW in Javascript, being unable to override == is actually a problem when you want to use objects as Map keys)

Python never met a footgun it didn’t need to adopt. In this case, however, it’s not equality checks, but operator overloading. I was a Python developer for a decade before switching to Go and life on this side is so much better.
Operator overloading has never been an issue for me, but terminating a line with a comma creating a tuple, or white space (including new lines) between strings to concatenate have cost me days of work over the years.

I understand why those exist, but they’re pure evil.