Hacker News new | ask | show | jobs
by MattJ100 1062 days ago
What kind of tooling do you/people use for that? Or just custom scripts?
3 comments

Look up OLTP vs OLAP data stores to get an idea. There are a lot of common patterns for the specifics of implementing this. Usually you run a regularly scheduled job that dumps data representing some time period (e.g. daily jobs). There are some considerations for late arriving data, which is a classic DE interview question, but for the most part, big nightly dumps of the last day’s data/transactions/snapshots to date-partitioned columnar stores using an orchestration engine like Airflow is sufficient for 99% of use cases.
Tangent: I hate OLTP and OLAP as acronyms. They're only one letter/word off and completely obscure the relevant meaning lots of semantic noise. Just say transactional vs analytical processing. (They are still good search key terms because lots of existing literature/resources use the terms)
(not the person you're replying to)

I can't recommend any specific tools without knowing a lot about the environment, but if you're looking for terms to google: ELT (Extract, Load, Transform) and CDC (Change Data Capture) will give you a sense of the landscape.

edit: the sibling comment that mentions Airflow is a good answer for an example of an ELT workflow.

Don't Maria, Postgres, etc make replication pretty easy?