Hacker News new | ask | show | jobs
by inertiatic 1857 days ago
My guy, we're talking about inserting N items to a data structure. Or executing an algorithm where the input is a set of N items.

The N here is the number of items you're inserting.

There's another number that we can consider a constant that's irrelevant to this analysis, M, the number of items in the data structure. M doesn't affect the complexity, as don't other numbers you can come up with, like D, the number of donuts you ate today. The complexity of our algorithm in this case is unaffected by changing these numbers.

An algorithm's complexity is talked to in terms of its input. Not in terms of if my input wasn't actually N sized but it was 1 and instead another hidden input which is the pre-existing data structure was sized N.

So, now, the algorithm of inserting a set of N items to a hash map is O(N), because it's broken down into N independent operations where each is constant time.

Same as, despite addition being constant time, the algorithm of adding the N items of a set is O(N).

1 comments

Okay, I see where you're confused. You're saying the following algorithm is O(N)

  for n in N
    hashset.insert(n)
Which... okay, but that's not exactly what big O notation is for, because it doesn't tell us anything about insert(). It's not a useful analysis, of this code, because it's the same for any set of N items. The algorithm for inserting into a list is the same

  for n in N
    list.insert(n)
Okay, so does this mean inserting into a hash set and a list have the same computational complexity? No, of course not. In your post you've already admitted my point: "it's broken down into N independent operations where each is constant time." Inserting into a hash set is constant time. Agreed.

> "The N here is the number of items you're inserting...There's another number that we can consider a constant that's irrelevant to this analysis, M, the number of items in the data structure. M doesn't affect the complexity"

Take this example. Let's say I have a linked list of M = 1 million items, and I want to insert N = 2 items into this list (without a tail pointer). How many iterations will I have to do: N, or NM? It's the latter of course.

But you've just told me the only thing I have to think about when analyzing an algorithm's complexity is the input set. The number of items in the list doesn't matter. Clearly that's not the case given this example.

Let's think about the hash set. I have M = 1 million items in the hash set and N = 2 items I want to insert. How many iterations will I have to do, 2 or 2 million? For a good hash set, just 2. Now, does this mean that inserting into a hash set is O(N) because N = 2? No, that's a misuse of asymptotic complexity analysis, because again, the N in this notation is thought to be tending to infinity. Since N << M we say the complexity of this algorithm is O(1). As the number of items in the data structure grows, the amount of time inserting doesn't change, even though you have to loop over N items to insert each one into the list.

> An algorithm's complexity is talked to in terms of its input. Not in terms of if my input wasn't actually N sized but it was 1 and instead another hidden input which is the pre-existing data structure was sized N.

Yes, but you can't stop your analysis of a data structure there, because the data structure has state. As you process the input set, the set inside the data structure grows, and depending on the data structure, this will change the overall complexity of the algorithm.

You can't say that we just consider the input set, because then there's no such thing as a constant time algorithm, since you always have to loop over N items at minimum. In a proper asymptotic complexity analysis of a data structure, it's absolutely essential to consider M, the number of elements in the data structure, what happens when you add an additional element, and how that changes over time. Otherwise, all algorithms are O(N) at minimum, and there's no such thing as a data structure that gives constant time insert.

As I asked you before, if inserting into a hash set is O(N), what's the point of a hash set? Because inserting into a list is O(N), so why would I ever use a hash set over a linked list? What is even the point of this data structure?