Hacker News new | ask | show | jobs
by camgunz 1008 days ago
This is just an interface, and you have the same problems with versioning and compatibility as you do with any interface. There's no difference here between the schema/semantics of a table and the types/semantics of an API.

IME what data pipelines do is they implement versioning with namespaces/schemas/versioned tables. Clients are then free to use whatever version they like. You then have the same policy of support/maintenance as you would for any software package or API.

1 comments

> There's no difference here between the schema/semantics of a table and the types/semantics of an API.

There is a big difference. The types of an API can be changed independently of your schema.

You're looking at the wrong layer. If we were to go to the layer you're talking about, we'd have internal and external tables where we could change the structure of the internal tables, and the rebuild/rematerialize the external tables/views from the internal ones.
If the external tables are views that can combine select columns from multiple tables with computed fields - maybe. In theory it’s good, in practice I’ve never seen it done well.
I do think tools to manage this stuff... basically don't exist, so I'm sympathetic to the argument that while there's mostly equivalency between data and software stacks, software stacks are way more on the rails than data stacks are. Which is to say, I have seen this stuff work well with experienced data engineers, but I think you need more experience to get the same success on the data side than you do on the software side.
Yeah, I could see that. It’s not common and the tooling is primitive. Same thing I would say about event sourcing. Great in theory, but it’s more likely to get your average team into trouble.
That’s the critical point - in theory this idea is fine.

In reality other ways of solving the same problem have a decade of industry knowledge, frameworks and tooling behind them.

Is the marginal gain from this approach being a slightly better conceptual match for a given problem than the “normal way” worth throwing away all of that and starting again for?

Definitely not in my opinion. You’ll need to spend so much effort on the tooling and lessons before you’re at the point where you can see that marginal gain appear.