Hacker News new | ask | show | jobs
by emehrkay 1852 days ago
I cant wait for that slices package. If I have to write this one more time im going to have a breakdown

    func inSlice(needle someType, haystack []someType) bool{}
1 comments

I would use hashmap and get O(1) instead of O(N).
Linear search through a slice is almost always going to be faster than a map operation, because N is almost always small and slices have great cache locality.

If your slice has a million entries, it's definitely a different story.

> If your slice has a million entries, it's definitely a different story.

Searching through a map is much faster than you think. At 20 entries searching through a slice of strings is already slower than searching in a map. For ints this threshold is 100 entries. Feel free to test on your own machine, my benchmark code is here: https://gist.github.com/JelteF/1251f180966eb974d7732ea6c3d3b...

You can run it with:

go test -bench='.' -benchtime=1s slicemap_test.go

The benchmark code is based on the one from this blogpost: https://www.darkcoding.net/software/go-slice-search-vs-map-l...

After looking at that code though I didn't trust the results too much anymore, because of various issues with it. So I decided to fix those issues.

Using a map might not be faster as others have pointed out, but it's often more readable. The core library does this in several places. A map[someType]bool makes for nice, readable code:

    if haystack[needle] {
The lack of an ordered map keep me using slices where I'd otherwise reach for a map. This is one of the things I'm looking forward to once generics are in Go.
You know that many hash functions are more expensive than iterating 1k worth of items that are contiguous in an array to find a matching item.
You know, inserting N elements into a hashmap is O(N).
You're conflating Ns here. The N in O(N) is the number of elements already in the collection, not the number of items you are inserting. Since we're talking about the asymptotic case, it's always assumed that N >> the number of elements you're inserting. In general, inserting into a hashmap should be O(1) with a constant time hash. The choice of a hashmap over a list depends on whether the constant time hash is slower than searching through the entire list.
>The N in O(N) is the number of elements already in the collection, not the number of items you are inserting.

For a hashmap, the operation of inserting N items is O(N) and yes that N is the number of items you are inserting. So flat out wrong on that one.

The rest is kind of obvious.

"For a hashmap, the operation of inserting N items is O(N) and yes that N is the number of items you are inserting."

This is not a proper application of big O notation. Big O notation is asymptotic. It's a shorthand for considering the performance of the algorithm as the number of elements tends to infinity. If you insert a single item into a linked list we don't say that is O(1) because it's a single item; you still need to iterate over N items already in the list before you can insert it. The more items in the list, the more items you have to iterate over before the insertion occurs.

But if you don't believe me about the hash map, here's an algorithm for inserting in C++ using linked lists for collision resolution:

  template <class T>
  bool HashSet<T>::insert(T item) {
      // Hash the item
      unsigned long ix = this->hash(item);
      // Find the item in the appropriate bucket. If it's already there, return false.
      if (this->array[ix]->findItem(item) != -1) {
          return false;
      }
      // If the item is not in this bucket, insert it.
      this->array[ix]->insertAtHead(item);
      return true;
  }
The only linear-time operation in this algorithm is findItem() on the linked list, but for a properly sized hash table, the linked list in question here should have very few items in it compared to the rest of the table (n_linked_list << N_hash_table). insertAtHead() is O(1) always. hash is O(1) always. The one thing missing in this example is resizing the table if it gets too full, which is O(N) and generally requires re-hashing all the items. Ideally you could make the table large enough when you initialize it to avoid this altogether, and then you can maintain that O(1) insertion time. This is why O(1) is the best case, while O(N) is the worst case. But O(1) should also be the common case.

Therefore if I have 1 item in my hash table or 1 million, it's still going to take the same amount of time to insert the next item. That's what we mean when we say inserting in a hash map O(1); it will take the same amount of time to insert the first item as the millionth item. Even though, yes, you have to call insert a million times. That's obvious. But that's not what asymptotic complexity analysis and big O notation is for.

With something like a linked list, if you try to insert a million items, it will be fast for the first one and slow for the millionth one (if you are inserting without a tail pointer). The rate at which it gets slower is linearly proportional to the number of items in the list. That's what we mean when we say inserting into a linked list is O(N).

If you still think inserting into a hash table is O(N), then answer me this: what's even the point of a hash table?

My guy, we're talking about inserting N items to a data structure. Or executing an algorithm where the input is a set of N items.

The N here is the number of items you're inserting.

There's another number that we can consider a constant that's irrelevant to this analysis, M, the number of items in the data structure. M doesn't affect the complexity, as don't other numbers you can come up with, like D, the number of donuts you ate today. The complexity of our algorithm in this case is unaffected by changing these numbers.

An algorithm's complexity is talked to in terms of its input. Not in terms of if my input wasn't actually N sized but it was 1 and instead another hidden input which is the pre-existing data structure was sized N.

So, now, the algorithm of inserting a set of N items to a hash map is O(N), because it's broken down into N independent operations where each is constant time.

Same as, despite addition being constant time, the algorithm of adding the N items of a set is O(N).

And inserting N elements into slice would be?..
the slice already exists in this context, that cost is already incurred.
"this context" is using a map from the beginning and not having that slice.
“This context” is also not known except one specific way in which the data is accessed, so blithely saying “I’d use a map instead” and being sure that’s the right answer in all cases is pretty uninformed.

It helps to assume the person who chose a slice did it for some actual reason and not because they are an idiot.