| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trebecks 809 days ago

if i'm reading the op right, they kind of use ebs as a buffer for fresh data until it ages out to s3. they use a "local" disk to hold the stuff used by the queries that people actually make and the queries run quick. they let the old stuff rot in s3 where its almost never used. that sounds like a good idea to save money plus the stuff that's done often is fast.

the ebs slas look reasonable to a non expert like me and you can take snapshots. it sound like you need to be careful when snapshotting to avoid inconsistencies if stuff is only partially flushed to disk. so you'd need to pause io while it snapshots if those inconsistencies matter. that sounds bad and would encourage you to take less frequent snapshots...? you also pay for the snapshot storage but i guess you wouldn't need to keep many. i like that aws defines "SnapshotAPIUnits" to describe how you get charged for the api calls.

with aurora, it looks like you can synchronously replicate to a secondary (or multiple secondaries) across azs in a single region. it sounds nice to have a sync copy of stuff that people are using. op says the'yre ok with a few seconds of data loss so i'm wondering how painful losing a volume right before taking a snapshot would be.

i wonder if anything off the shelf does something similar. it sounds like people are suggesting clickhouse. i saw buffer table in their docs and it sounds similar https://clickhouse.com/docs/en/engines/table-engines/special.... it looks like it has stuff to use s3 as cold storage too. i even see geo types and functions in the docs. i've never used clickhouse so i don't know if i'm understanding what i read, but it sounds like you could do something similar to whats described in the post with clickhouse if the existing geo types + functions work and you are too lazy to roll something yourself.