Hacker News new | ask | show | jobs
by aravindet 1888 days ago
I am considering Materialize for a project, and I'm concerned about the limitation that materialized views should fit into the memory of a single host.

My use case is a materialized OLAP "fact" tables, created by joining a 5-10 separate tables. While the biggest source tables have under a million rows, the joined fact table would have perhaps 100 million rows which I fear might be too big.

One workaround I considered is creating a non-materialized view, TAIL-ing it, and persisting the results to another database.

Could you comment on this problem and workaround?

1 comments

At the moment, we are focused on scaling up with a single instance. While there are a couple of strategies for scaling out, none of our current methods support scaling a single view beyond a single instance. This isn't really a technical limitation but rather more of a testing and supportability limitation. Timely and Differential (the incremental computation frameworks used by Materialize) suport scale-out and have been extensively tested in scale-out scenarios.

Do you have a limit in mind on how large you're willing to go? A brief bit of napkin math shows that, at 1KB per record, your largest view should fit within 100GB of memory. This is easily supported by Materialize and I can certainly understand if this exceeds your appetite!

If you're interested in reducing the size of the in-memory dataset, does your fact table have a temporal dimension to it? By default, Materialize stores data for all time (like a database) and you can write views in such a way that it will only materialize recent data (like a stream processor). Our co-founder Frank wrote a blog post[1] detailing how to do this.

I'm honestly not sure how TAIL-ing a non-materialized view would perform. Sounds like something that would be fun to test!

Happy to chat about this further in our community Slack channel if you have more questions.

[1] https://materialize.com/temporal-filters/