You put in-place a loss mitigation strategy. This strategy will vary by application. In my case, I have a similar setup where we write 25-30k records to SQLite daily. We start each day fresh with a new SQLite db file (named yyyy-mm-dd.db) and back it up to AWS S3 daily under the scheme /app_name/data/year/month/file. You could say that's 9 million records a year or 365 mini-sqlite dbs containing 25-30k records. Portability is another awesome trait of SQLite. Then, at the end of the week (after 7 days that is), we use AWS Glue (PySpark specifically) to process these weekly database files and create a Parquet (snappy compression) file which is then imported into Clickhouse for analytics and reporting.
At any given point in time, we retain 7 years worth of files in S3. That's approx. 2275 files for under $10/month. Anything older, is archived into AWS Glacier...all while the data is still accessible within Clickhouse. As of right now, we have 12 years worth of data. Hope it helps!
Does that mean it's okay for your application to loose transactions (which occured between the backup point and the failure point) or do you have other mitigations ?
I'm the author of Litestream, which is an open-source tool for streaming replication for SQLite. That could be a good option if you need to limit your window for data loss. We have a pretty active Slack if you need help getting up and running. https://litestream.io/
I’m not anywhere near the banking industry but from HN alone I’ve been led to believe dailyish huge file transfers are also the norm in a variety of situations (aka SQLite’s backup strategy).
Isn't that how all backups work? If you need to prevent data loss then backups probably aren't your tool of choice. And if you're paranoid about data loss then any replication lag is also unacceptable.
* I'm worried about my server blowing up: Transactions have to be committed to more than one DB on separate physical hosts before returning.
* I'm worried about my datacenter blowing up: Transactions have to be committed to more than one DB in more than one DC before returning.
At any given point in time, we retain 7 years worth of files in S3. That's approx. 2275 files for under $10/month. Anything older, is archived into AWS Glacier...all while the data is still accessible within Clickhouse. As of right now, we have 12 years worth of data. Hope it helps!