Hacker News new | ask | show | jobs
by wrmsr 2314 days ago
In the real world it is inevitably a lot more than just those 3 tables - add 'groups', different types of groups, different privacy settings, different per-user feed preferences, experiments, and any number of other things which _can_ be expressed as pure, normalized, joined tables in a matview but make naive approaches a lot less likely to actually work in prod.

In my experience the most successful approach to this is a midpoint - you materialize/denormalize enough to feed your app endpoints and search engines but retain flexibility in searching those fat but instantly available docs, and relatedly you also don't always need to preemptively materialize absolutely everything in any particular view - see https://engineering.fb.com/data-infrastructure/dragon-a-dist... . Without being able to transparently operate on arbitrarily partially populated matviews you are locked into a self-defeating all-or-nothing system that is likely to culturally do more harm than good with its rigidity. Imagine for example if there were no 'caches', just a binary choice of precomputing everything ahead of time or recomputing everything every time. Neither extreme is sufficient for all cases and real applications are comprised of many different points on that spectrum.

1 comments

That's fair. It will be interesting to see what people do along those lines, creating various materialized views, joining them at query time, chaining materialized views, and I think most important to your point, creating new kinds of sinks for the updates.

Right now Kafka is the only sink ( https://materialize.io/docs/sql/create-sink/ ), but because it integrates with Confluent's schema registry, I'm guessing it should work well with many of Confluent Connect's sinks, ( https://docs.confluent.io/current/connect/managing/connector... ). Elasticsearch would be an especially useful sink connector I think to your point.

I haven't used any of these things together, so right now I'm totally speculating on the potential.

What I'm mostly envisioning is that there are a lot of smaller scale applications where the complexity of adding an activity feed just isn't worth it. But if you could implement a feature like that trivially it could be game changing.