|
|
|
|
|
by amluto
680 days ago
|
|
> :wave: Hi, I'm the creator of Gazette. Hi! > if an application correctly writes bad data, then you'll have bad data in your journal. This is no different from any other file format under the sun. In a journal that delimits itself, a bad write corrupts only that write (and anything depending on it) — it doesn’t make the next message unreadable. I’m not sure how I feel about this. I maintain a journal-ish thing for internal use, and it’s old and crufty and has all manner of design decisions that, in retrospect, are wrong. But it does strictly separate writes from different sources, and each message has a well defined length. Also, mine supports compressed files as its source of truth, which is critical for my use case. It looks like Gazette has a way to post process data before it turns into a final fragment — nifty. I wonder whether anyone has rigged it up to produce compressed Parquet files. |
|
But more to the point, journals are meant for things that are written _and read_ sequentially. Parquet wasn't really designed for sequential reads, so it's unclear to me whether there would be much benefit. IMHO it's better to use journals for sequential data (think change events) and other systems (e.g. RDBMS or parquet + pick-your-compute-flavor) for querying it. I don't think there's yet a storage format that works equally well for both.