| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sujayakar 854 days ago

this is a really cool idea.

one followup I was thinking of is whether this can generalize to queries other than key value point lookups. if I'm understanding correctly, the article is suggesting to take a key value store, and for every `(key, value)` in the system, split `value` into fragments that are stored on different shards with some `k` of `M` code. then at query time, we can split a query for `key` into `k` subqueries that we send to the relevant shards and reassemble the query results into `value`.

so, if we were to do the same business for an ordered map with range queries, we'd need to find a way to turn a query for `interval: [start, end]` into some number of subqueries that we could send to the different shards and reassemble into the final result. any ideas?

2 comments

ddorian43 854 days ago

There https://ydb.tech/ open source db that uses erasure coding for replication in single zone/region.

link

eivanov89 854 days ago

In YDB with block 4+2 erasure coding, you need half the disk space compared to mirror-3-dc schema. Meanwhile CPU usage is just a little bit higher, thus in high throughput tests mirror-3-dc wins. Indeed as mentioned in the post there might be a tail latency win in latency runs, but if your task is throughput with a reasonable latencies, replication might be a better choice.

link

pjdesno 854 days ago

If you only care about throughput, just fetch the data and read should be the same speed as triple replicated.

For writing, triple-rep has to write 2x as much data or more, so it's going to be slower unless your CPUs are horribly slow compared to your drives.

link

ddorian43 854 days ago

I expect it to save a lot of CPU by only needing 1/3x of compactions. You might want to do a benchmark on that ;). An example is quickwit (building inverted indexes is very expensive).

link

ddorian43 854 days ago

> so, if we were to do the same business for an ordered map with range queries, we'd need to find a way to turn a query for `interval: [start, end]` into some number of subqueries that we could send to the different shards and reassemble into the final result. any ideas?

Dbs that are backed by s3-like-storage, the storage does this for you, but for blocks of, say, 1MB, and not per-kv (high overhead).

Think you use rocksdb in your db, and erasure-code the sstables.

link