Hacker News new | ask | show | jobs
by bob1029 1738 days ago
Performance is just fine for us. It's all about scoping for the right things. You are never looking for the whole schema all at once. No one ever said it had to be running on disk either.
1 comments

> You are never looking for the whole schema all at once.

Well we have to. Our system exports the data to other systems, which would involve generating a file with all the data (think order or invoice but with 100+ fields). For many of our customers this job is time sensitive.

Besides that we have overview grids (think "all active orders") for our customers, for many of these our customer will want to have 20+ fields visible. Using your method that can be 50-100 joins, and this would be an interactive case so again time sensitive.

That's why I was curious how it scaled.

> No one ever said it had to be running on disk either.

Well our data absolutely has to be persisted to disk on commit, and a stale cache is not tolerable. But apart from that, sure.

Exporting data to other systems is a massively different idea from using the data in a business-transactional sense.

If you wanted to bulk export/stream transactions from my system, you would simply take the compressed event log batches and replicate them wherever they need to go. The working set is kept in memory and can be reconstructed by simply replaying the events.

If I were going to solve your problem, I would probably maintain 2 materialized working sets in memory, one potentially on another server that is slightly behind real-time. The cool thing here is you can replay the events into any arbitrary schema, as long as they are granular enough (6NF+).