| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gravadlax 2427 days ago

Hi, I am one of the authors of this post.

As you mention, it is a trade-off. In this case we are trying to balance CPU and memory usage. First of all, if you just use the original vectors (or the quantized ones mentioned in the post), it might very well be the case that there is no server where you can even place the index.

So let us consider the example in the post and that we have a server with enough memory. We put the index (1 TB) in RAM and assume that one request takes 1-2 ms. Then a single CPU core can handle 500-1000 requests per second. So now we are using a lot of RAM and very little CPU. Depending on your use case, this can very well be better than what is proposed in the post, but for some it might be preferable to use servers with more balanced resources.

And just to clarify, the memory is shared between CPUs. So you pay the memory price once and you can then scale it to all the CPUs.