Hacker News new | ask | show | jobs
by dist1ll 811 days ago
Per-core store bandwidth is at least 14GB/s on Zen3, 35GB/s for non-temporal stores. Parsing JSON can be done at +2GB/s.

It's very healthy to take maximum bandwidth limits into consideration when reasoning about performance. For instance, for temporal stores, the bottlenecks you see are due to RAM latency and memory parallelism, because of the write-allocate. The load/store uarch can actually retire way more data from SIMD registers.

So there's already some headroom for CPU-bound tasks. For instance 11MB/s is very slow for JIT baseline compiler. But if your particular problem demands arbitrary random access that exceed L3 regularly, maybe that speed is justified.

1 comments

What we do is CPU bound and we are not just parsing JSON here.

The largest work we do is building an inverted index. Oversimplified, it is equivalent to this:

  inverted_index = defaultdict(list)
  for (doc_id, doc_json) in enumerate(doc_jsons):
    c = json.loads(payload)
    for (field, field_text) in c.items():
      for (position, token) in enumerate():
        inverted_index[token].push((doc, position))
serialize_in_compressed_way_that_allows_lookup(inverted_index)

You can implement it in a couple of hours in the language of your choice to get a proper baseline.

I am sure we can still improve our indexing throughput... but I have never seen any search engine indexing as fast as tantivy.

If someone knows a project I should know of, I'd be genuinely keen on learning from it.

I'm curious, what is your frame of reference with regards to maximum speed of building inverted indices? Like, what is the maximum throughput you'd expect for this type of task, and what is your reasoning for it?