Hacker News new | ask | show | jobs
by porridgeraisin 458 days ago
> by this metric

False equivalence. You're saying that the statement "both for.. append and .map() executing the _same steps_ in the _same order_ are the same" is equivalent to saying that "two statements being composed of cmp,jmp, etc (in totally different ways) are the same" That is a dishonest argument.

> Distinct could sort and look for consecutive, it could use a hashmap

People love happy theories like this, but find me one major implementation that does this. For example, here is the distinct() implementation in the most abstraction-happy language that I know - C#

https://github.com/dotnet/runtime/blob/main/src%2Flibraries%...

It unconditionally uses a hashset regardless of input.

Edit: found an example which does different things depending on input

https://github.com/php/php-src/blob/master/ext/standard/arra...

This does the usual hashset based approach only if the array has strings. Otherwise, it gets the comparator and does a sort | uniq. So, you get a needless O(nlogn), without having a way to distinct() by say, an `id` field in your objects. Very ergonomic...

On the other hand...

  seen := map[string]bool{}
  uniq := []string{}
  for _, s := range my_strings {
      if !seen[s] {
          uniq = append(uniq, s)
          seen[s] = true;
      }
  }
Let us say you want to refactor and store a struct instead of just a string. The code would change to...

  seen_ids := map[string]bool{}
  uniq := []MyObject{}
  for _, o := range my_objects {
      if !seen_ids[o.id] {
          uniq = append(uniq, o)
          seen_ids[o.id] = true;
      }
  }
Visually, it is basically the same, with clear visual messaging of what has changed. And as a bonus, it isn't incurring a random performance degradation.

Edit 2: An SO question for how to do array_unique for objects in php. Some very ergonomic choices there... https://stackoverflow.com/questions/2426557/array-unique-for...

2 comments

I'm not sure reaching for PHP to dunk on functional languages is the win you think it is?

> Visually, it is basically the same, with clear visual messaging of what has changed.

In order to do this you had to make edits to all but a single line of logic. Literally only one line in your implementation didn't change.

Compare, with Ruby:

    # strings
    uniq = strings.uniq()

    # structs, if they are considered equal based on the
    # field in question (strings are just structs and they
    # implement equality, so of course this is the same)
    uniq = structs.uniq()

    # structs, if you want to unique by some mechanism
    # other than natural equality
    let uniq = structs.uniq(&:id)
With Rust:

    # strings
    let uniq = strings.unique();

    # structs, if they are considered equal based on the
    # field in question (strings are just structs and they
    # implement equality, so of course this is the same)
    let uniq = structs.unique();

    # structs, if you want to unique by some mechanism
    # other than natural equality
    let uniq = structs.unique_by(|s| s.id);
You cannot tell me with a straight face that your version is clearer. You'll note that both languages have essentially the exact same code, which is to say: nearly none at all.
> I'm not sure reaching for PHP

My initial comment was a response to your comment

> Distinct could sort and look for consecutive, it could use a hashmap. It can even probe the size of the array to pick the performance-optimal approach. I don't have to care.

which says that you just use an abstraction and it transparently "does the right thing". The C# example showed how abstractions actually don't do that, and instead simply provide a lowest common denominator implementation. The rust examples you used also uses a hashmap unconditionally. The PHP example was showing how when abstractions attempt to do that it ends up even worse - when you move from strings to structs, you get a slowdown.

In practice, different strategies are always implemented as different functions (see python `moreitertools` `unique_justseen` and `unique_everseen`), at which point its no longer an abstraction (which by definition serves multiple purposes) and it just becomes a matter of whether or not the set of disparate helper functions is written by you, the standard library, or a third party. In rust, you would do `vec.sort()`, `vec.dedup()` for one strategy, and call `v.into_iter().unique().collect()` for the other strategy. That is not an "abstraction" [which achieves what you claimed they do].

> It unconditionally uses a hashset regardless of input.

This follows what I like to call macro optimization. I don't care if a hash set is slightly slower for small inputs. Small inputs are negligibly fast anyway. The hash set solution just works for almost any dataset. Either it's too small to give a crap, midsized and hash set is best, or huge and hash set is still pretty much best.

That's why we default to the best time/space complexity. Once in a blue moon someone might have an obscure usecase where micro optimizing for small datasets makes sense somehow, for the vast majority of use cases this solution is best.