| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CodesInChaos 586 days ago

1. Do you support compression for data stored in segments?

2. Does the choice of storage class only affect chunks or also segments?

To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so. But I assume that would require significant engineering effort for applications that require data to be replicated to several AZs before acknowledging them. Though some applications might be willing to sacrifice 1s of writes on node failure, in exchange for cheap and fast writes.

3. You could be clearer about what "latency" means. I see at least three different latencies that could be important to different applications:

a) time until a write is durably stored and acknowledged

b) time until a tailing reader sees a write

c) time to first byte after a read request for old data

4. How do you handle streams which are rarely written to? Will newly appended records to those streams remain in chunks indefinitely? Or do you create tiny segments? Or replace and existing segment with the concatenated data?

2 comments

shikhar 586 days ago

(Founder) Thanks for the deep questions!

1) Storage is priced on uncompressed data. We don't currently compress segments.

2) It only affects chunk storage. We do have a 'Native' chunk store in mind, the sketch involves introducing NVMe disks (as a separate service the core depends on) - so we can offer under 5 millisecond end-to-end tail latencies.

3) The append ack latency and end-to-end latency with a tailing reader is largely equivalent for us since latest writes are in memory for a brief period after acknowledgment. If you try the CLI ping command (see GIF on landing page) from the same cloud region as us (AWS us-east-1 only currently), you'll see end-to-end and append ack latency as basically the same. TTFB for older data is ~ TTFB to get a segment data range from object storage, so it can be a few hundred milliseconds.

4) We have a deadline to free chunks, so we we PUT a tiny segment if we have to.

link

jgraettinger1 586 days ago

> To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so.

Yep, this is approximately Gazette's architecture (https://github.com/gazette/core). It buys the latency profile of flash storage, with the unbounded storage and durability of S3.

An addendum is there's no need to flush to S3 quite that frequently, if readers instead tail ACK'd content from local disk. Another neat thing you can do is hand bulk historical readers pre-signed URLs to files in cloud storage, so those bytes don't need to proxy through brokers.

link