Hacker News new | ask | show | jobs
by siganakis 1787 days ago
From my experience, the core driver behind the data mesh architecture is organisational, not technological. Organisations are requiring more of data, be it for rapid product development, or self-service analytics. Often this involves large numbers of sources (e.g. external sources), rather than just larger volumes of the same thing.

If marketing, finance and sales is dependent on a centralised data team for every new thing, the data team quickly becomes the bottleneck, stifling innovation and frustrating teams. Incorporating the principles of a Data Mesh enables those teams to manage their own data, according to well defined governance standards that enable interoperability.

The reality is that different teams are already managing their own data (via excel spreadsheets, web-apps, etc). If we can apply a bit more rigor to how these datasets are managed (e.g. so they can be shared, integrated, secured, etc), then the whole organisation benefits.

3 comments

I think I’m experiencing this where I work. The Data Lake is quickly gaining traction and feature requests poor in: please incorporate FHIR genomics resources, please make a UI for this image type, place make import filters to extract meta data from these files… this team seems swamped now. The solution would be to give more power to the requesters? Allow them to access underlying technologies, implement their own data models? Seems logical. Am I understanding this correctly?
Yes, you are understanding it correctly. The idea is that you give the "requesters" access to the data, then enable them to do their thing with it (with training / support / shadowing) and publish their results as "data-products" so that others can leverage it too in their own "data products".

The "data mesh" is essentially the collection of these independent "data-products".

We already see management problems with self-service analytics like PowerBI, Tableau & Looker. Its too easy for people to create dashboards / reports that are subtly wrong and which cause confusion. There is a balance between empowering to build data products and centralised control. Too much empowerment of people who don't understand the right way to do something leads to a horrible mess of contradictory data. Not enough, and people can't effectively do their job. Governance and process is the key to finding the balance and enforcing it.

The issue with the data-mesh is that there isn't really any great tooling to support the management or development of data products, or a data-mesh generally. I am sure this will change over time as vendors start building hype around it.

a bit self serving but I would recommend reading about Airbnb's Minerva (which I created). we leverage this data mesh concept to allow teams to define data independently and then Minerva handles blending the data from different teams together with guaranteed consistency.

you can read more here: https://medium.com/airbnb-engineering/airbnb-metric-computat...

I help run the data mesh community and yup, 100%. There's a reason data mesh is catching on as fast as it is because, if done right, it really feels like it can solve a lot of the agility/scalability problems people feel re data/analytics now. It is NOT a silver bullet but it can potentially really help companies towards that (obnoxiously named and overused) goal of being data driven.
Agree. We see this a lot at clients whom we work with. While I agree with data mesh on a philosophical/ principles level, at the implementation level, it creates a division between data “haves” (those who have the engineering know-how to write parallel processing jobs) and data “have nots”.

End result - implementation of data mesh might deepen the divide between data "haves" and the data "have nots".

A better way could be to implement Trino, Starburst, or Tetmon EdgeSet (where I am co-founder of), to realise the vision of data mesh.