Hacker News new | ask | show | jobs
by faragon 3951 days ago
Be careful with theoretical asymptotic complexity (big O) related to execution time. E.g. if your algorithm time complexity is O(1), but internally calls a higher complexity function, e.g. malloc(), implemented with higher complexity, e.g. O(log n), your algorithm time complexity would be O(log n) and not O(1). It could be even worse: on average or typical constant time algorithm could be in reality an O(n) one: e.g. case of hash table reindexation (that's the reason of why many big data structures, including most SQL databases, requiring real time behavior, are implemented as trees, tree hierarchies/division/clustering, instead of big hash tables).
2 comments

I don't think that the `n` in the case of malloc would always be relevant to the semantics of the query. In that case, it would still be appropriate to refer to it as constant time.

For instance, you don't typically look at the size of the literals in the query when evaluating query complexity. If it's really unbounded, you probably shouldn't use a relational database.

Not necessarily, but could be the case. Typical malloc implementations have different management for at least small and big memory requests, using different pools, in order to reduce fragmentation. Also, because operations involving virtual address remap are expensive (realloc on a small block is faster with a full memcopy to adifferent location, rather than doing stuff involving the OS kernel doing virtual address remap).

The "problem" of malloc() function (or any other equivalent allocation stuff) is that internally manages free blocks (it is a middleman between the OS and user process -the purpose is to reduce OS calls-), if you have lots of them, dynamic memory could take time. For "malloc" I meant malloc/realloc/free, the whole kit. Those operations are not free (in most cases you're not going to have millions of allocations in one process, that was just an example of hidden things that could make your algorithm not behave like expected).

I you're right. In fact in the optimizer part I say (in a simple way) that big O (i.e. asymptotic complexity) is not the same as CPU cost but it's easier for me because the real cost of an operation depends on the CPU architecture.

Someone told me the same on the article comments and here is the answer I gave him:

You’re right and I agree with you. When I wrote this part, I REALLY hesitated to give the real asymptotic definition and what it means for the number of operations but I chose a simpler explanation since the aim of this post is not to become an expert but to have a good idea. I hope that this won’t mislead people but I thought the real definition was too hard for a “newcomer” and not important to understand a database. This is also why I added in this part “The time complexity doesn’t give the exact number of operations but a good idea.” and said at the end of the part “I didn’t give you the real definition of the big O notation but just the idea” with a link to the real definition.