| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by teraflop 160 days ago
	Don't get me wrong, it's fun to see performance optimizations like this. But I'd expect that a naive implementation of the same strategy would already take like 0.1% of the time needed to actually generate embeddings for your chunks. So practically, is it really worth the effort of writing a bunch of non-trivial SIMD code to reduce that overhead from 0.1% to 0.001%?

2 comments

imperio59 160 days ago

From the author: > at some point we started benchmarking on wikipedia-scale datasets. > that’s when things started feeling… slow.

So they're talking about this becoming an issue when chunking TBs of data (I assume), not your 1kb random string...

link

groby_b 160 days ago

But the bottleneck is generating embeddings either way.

memchunk has a throughput of 164 GB/s. A really fast embedder can deliver maybe 16k embeddings/sec, or ~1.6GB/s (if you assume 100 char sentences)

That's two orders of magnitude difference. Chunking is not the bottleneck.

It might be an architectural issue - you stuff chunks into a MQ, and you want to have full visibility in queue size ASAP - but otherwise it doesn't matter how much you chunk, your embedder will slow you down.

It's still a neat exercise on principle, though :)

link

viraptor 160 days ago

It doesn't matter if A takes much more time than B, if B is large enough. You're still saving resources and time by optimising B. Also, you seem to assume that every chunk will get embedded - they may be revisiting some pages where the chunks are already present in the database.

link

groby_b 159 days ago

Amdahl's law still holds, though. If A and B differ in execution times by orders of magnitude, optimising B yields minimal returns (assuming streaming, vs fully serial processing)

And sure, you can reject chunks, but a) the rejection isn't free, and B) you're still bound by embedding speed.

As for resource savings.... not in the Wikipedia data range. If you scale up massively and go to a PB of data, going from kiru to memchunk saves you ~25 CPU days. But you also suddenly need to move from bog-standard high cpu machines to machines supporting 164GB/s memory throughput, likely full metal with 8 memory channels. I'm too lazy to do the math, but it's going to be a mild difference at O($100)

Again, I'm not arguing this isn't a cool achievement. But it's very much engineering fun, not "crucial optimization".

link

topdog123 160 days ago

Agreed. For any code written, there is a sort of return on time expended. Optimisations are really only required when demanded.

link